Dropping historical entries in a computed satellite?

mrcol · 19 January 2023 14:58

Hi everyone!

I would like to load a computed satellite into a business vault but am interested only in the most recent versions of the attributes for each business key.

Is it OK from the DV2.0 standard perspective, not to include historical values in a computed satellite (and update the fields in-place)?

Thanks for help,
Mark

Christopher · 23 January 2023 18:10

Hi, Mark. That’s fine. DV2.0 allows you to determine how much history is kept. In your case the answer is zero! And the raw satellites still have all the history in case you ever need to recreate what the current-record was for a given day.

patrickcuba · 5 February 2023 21:29

No, do not delete history,

Business Vault is the auditable facts but based on derived content.

If you’re using Snowflake, here’s a technique to get the current record every time: Data Vault on Snowflake: Querying really BIG satellite tables | by Patrick Cuba | Snowflake | Medium

Nat · 22 February 2023 17:40

A view that gives you the latest (along with an end of time date) would be better than deleting.

If you have a reasonable sized data set (few million rows) then a simple way is to rank / partition over the keys order so that the latest = 1 and then where clause = 1 to get the latest.

I’ve seen another version that is generated by Scalefree’s fork of dbtvault which uses lead and a timestamp offset to get the latest record as is-current. It seems to work but I haven’t deeply analysed it.

WITH

end_dated_source AS (

SELECT
    hk_party_h,
    hd_party_s,
    rsrc,
    ldts,
    COALESCE(LEAD(ldts - INTERVAL '1 MICROSECOND') OVER (PARTITION BY hk_party_h ORDER BY ldts),TO_TIMESTAMP('8888-12-31T23:59:59', 'YYYY-MM-DDTHH24:MI:SS')) AS ledts,

(rest of select clause)
FROM foo

)

SELECT
hk_party_h,
hd_party_s,
rsrc,
ldts,
ledts,
CASE WHEN ledts = TO_TIMESTAMP(‘8888-12-31T23:59:59’, ‘YYYY-MM-DDTHH24:MI:SS’)
THEN TRUE
ELSE FALSE
END AS IS_CURRENT,
(rest of select clause)
FROM end_dated_source
);

Patrick’s approach is likely more bombproof at big scale, the views suggested may get expensive at runtime, but do your testing.

moravic · 6 November 2023 15:55

I can clearly see the benefit of this C-PIT tables if you are using Snowflake!

In our case we have an INSERT only hadoop data lake. For some fact tables with are just interest on current records, moreover we just export a few months of data to Oracle as performance layer of datamarts for BI users.

Our overnight processes build a view of current records Satellites, based on DV model metadata, (hundreds of millions of data) and build the few months fact tables (few millions of data) from them; and for some intraday loads, we just build a view of last hour data Satellites (based on DV model metadata) and build the incremental fact tables from them, and MERGE the data in Oracle.

On the other hand, we also load historical effectivity Satellites following DV2.0 as auditable data.

Is our best approach we can think of, to keep Data Vault approach and have a good performance to refresh our fact tables, especially for intraday loads.

patrickcuba · 7 November 2023 06:51

Hadoop-based would be immutable files I’d reason, and you’d need some form of SQL to query, join and make sense of the data. The beauty of using a C-PIT in Snowflake’s case is the JoinFilter — but also using a PIT in a star-join an OLAP query engine uses a hash-join to optimise performance of the query (performs a Right-Deep Join Tree that Snowflake favours). I don’t think you have the same thing on HiveSQL or SparkSQL — I might be wrong, if you do then PITs will help.

Topic		Replies	Views
How to model Data Vault when source system has a separate table with history Data Vault 2.0	4	170	3 February 2025
Satellite loading Insert statement standard Data Vault 2.0 satellite	9	1394	22 February 2023
dbtvault update a flag column in old row AutomateDV (formerly known as dbtvault)	3	554	2 November 2022
Tracking deletes in a regular satellite Data Vault 2.0	9	1448	21 June 2023
More than one record in SAT for a Business Key with the same LOAD_DATE in a single load Data Vault 2.0	32	1748	11 October 2023

Dropping historical entries in a computed satellite?

Related topics