[Need Advice] Sat Table Reloading Stretegy

Samvar · 11 February 2025 16:08

Hey there ,

I want some advice on the data vault modelling and load stretegy of the sat table.

In our load progress
we have load from the data lake into the landing layer in the batches and these landing layer are used to load in the sat tables in our system

And in the load approach we just looked there the data in the landing table is there is in the sat table or not if it is not there it will inserted into it.

we are not handling state changed

for example
if the for particular key changes in the different in point of time

A → B → A

we are only getting A → B change in the SAT table

But we are going to change in the loading apprach

we thought we have delta feed in between in the sat table

Data Lake → Landing Table → Stage Table( Delta Feed) → Sat Table

Compare latest record in the sat table and Delta Feed.

And we consider dbt vault package

But there is one problem

we do often reload in the landing table historically table

Example

Stage
ID , INSERT_TIMESTAMP , PINCODE , Business
1 , 1 Jan , 100 , 1 Jan
2 , 1 Jan , 200 , 1 Jan
1 , 2 Jan , 200 , 2 Jan

Sat Table
ID , INSERT_TIMESTAMP , PINCODE , Business
1 , 1 Jan , 100 , 1 Jan
2 , 1 Jan , 200 , 1 Jan
1 , 2 Jan , 200 , 2 Jan

if i reload the ID 1 again in the from the data lake then the historically data

Stage
ID , INSERT_TIMESTAMP , PINCODE , Business
1 , 1 Jan , 100 , 1 Jan
2 , 1 Jan , 200 , 1 Jan
1 , 2 Jan , 200 , 2 Jan
1, 3 Jan , 100 , 1 Jan < — Reloaded data

Delta Feed
ID , INSERT_TIMESTAMP , PINCODE , Business
1, 3 Jan , 100 , 1 Jan

It will compare the latest record in the sat table consider it has latest and reload the data duplicated record in the sat

Sat
ID , INSERT_TIMESTAMP , PINCODE , Business
1 , 1 Jan , 100 , 1 Jan
2 , 1 Jan , 200 , 1 Jan
1 , 2 Jan , 200 , 2 Jan
1, 3 Jan , 100 , 1 Jan

I am currently using INSERT_TIMESTAMP has waterlevel

Can some one advice on this scenerio pls.

Thank you

Frankie · 11 February 2025 22:00

Not sure if I’ve 100% understood you correctly so correct me if I’m not answering your question.

It sounds to me like your satellite load pattern is pulling unique PK, HASHDIFF combinations but isn’t considering the sequencing of the changes, you should compare the PK and HASHDIFF to the LAG of that PK on the LDTS and see if the hash changes.

When in doubt refer to @patrickcuba’s wonderful book with examples of load patterns which probably has a more performant example of what I’ve described.

All the best Samvar! I hope you get the advice you need.

Topic		Replies	Views
Satellite loading Insert statement standard Data Vault 2.0 satellite	9	1389	22 February 2023
More than one record in SAT for a Business Key with the same LOAD_DATE in a single load Data Vault 2.0	32	1744	11 October 2023
Optimizing SAT Loading from Staging as a VIEW in Snowflake Data Vault 2.0	15	514	5 August 2022
dbtvault update a flag column in old row AutomateDV (formerly known as dbtvault)	3	550	2 November 2022
DQ issues: Error Vault? Data Vault 2.0	4	439	7 August 2022

[Need Advice] Sat Table Reloading Stretegy

Related topics