[Need Advice] Sat Table Reloading Stretegy

Hey there ,

I want some advice on the data vault modelling and load stretegy of the sat table.

In our load progress
we have load from the data lake into the landing layer in the batches and these landing layer are used to load in the sat tables in our system

And in the load approach we just looked there the data in the landing table is there is in the sat table or not if it is not there it will inserted into it.

we are not handling state changed

for example
if the for particular key changes in the different in point of time

A β†’ B β†’ A

we are only getting A β†’ B change in the SAT table

But we are going to change in the loading apprach

we thought we have delta feed in between in the sat table

Data Lake β†’ Landing Table β†’ Stage Table( Delta Feed) β†’ Sat Table

Compare latest record in the sat table and Delta Feed.

And we consider dbt vault package

But there is one problem

we do often reload in the landing table historically table

Example

Stage
ID , INSERT_TIMESTAMP , PINCODE , Business
1 , 1 Jan , 100 , 1 Jan
2 , 1 Jan , 200 , 1 Jan
1 , 2 Jan , 200 , 2 Jan

Sat Table
ID , INSERT_TIMESTAMP , PINCODE , Business
1 , 1 Jan , 100 , 1 Jan
2 , 1 Jan , 200 , 1 Jan
1 , 2 Jan , 200 , 2 Jan

if i reload the ID 1 again in the from the data lake then the historically data

Stage
ID , INSERT_TIMESTAMP , PINCODE , Business
1 , 1 Jan , 100 , 1 Jan
2 , 1 Jan , 200 , 1 Jan
1 , 2 Jan , 200 , 2 Jan
1, 3 Jan , 100 , 1 Jan < β€” Reloaded data

Delta Feed
ID , INSERT_TIMESTAMP , PINCODE , Business
1, 3 Jan , 100 , 1 Jan

It will compare the latest record in the sat table consider it has latest and reload the data duplicated record in the sat

Sat
ID , INSERT_TIMESTAMP , PINCODE , Business
1 , 1 Jan , 100 , 1 Jan
2 , 1 Jan , 200 , 1 Jan
1 , 2 Jan , 200 , 2 Jan
1, 3 Jan , 100 , 1 Jan

I am currently using INSERT_TIMESTAMP has waterlevel

Can some one advice on this scenerio pls.

Thank you

Not sure if I’ve 100% understood you correctly so correct me if I’m not answering your question.

It sounds to me like your satellite load pattern is pulling unique PK, HASHDIFF combinations but isn’t considering the sequencing of the changes, you should compare the PK and HASHDIFF to the LAG of that PK on the LDTS and see if the hash changes.

When in doubt refer to @patrickcuba’s wonderful book with examples of load patterns which probably has a more performant example of what I’ve described.

All the best Samvar! I hope you get the advice you need.

1 Like