Hope you can help
We’re trying to model inventory in a (number of) warehouses in data vault.
Each inventory position is based on a SKU, a batch numnber, a location (for example a warehouse or a transportation route), and a status (in stock, quarantined etc) at a particular point in time as well as all the values.
A batch can contain multiple SKUs so the business key is basically SKU x Batch ID.
We get from the source the full snapshot of the inventory each night - we have a complication in that the source system does not record deletions or zero-outs (it just removes the record, leaving no ‘end date’) and we don’t have a CDC so we can’t do this.
What I think we have is
hub : SKU_bk, Batch_ID_bk, ++ bk_hash ++ other hub stuff
sat: bk_hash, location, status, number units, (some other descriptive stuff) ++ other sat stuff
What I don’t understand is how to get the inventory as of today - especially given the restrictions on the source system as we can’t have a snapshot date as part of the hub BK and there is no end date on any satellites, meaning that if a position is deleted rather than zeroed out, deleted records will remain in the effective view in the satellite.
does this make sense? (allowing a snapshot date as part of a BK would be helpful but realise this isn’t allowed)
I believe Dan touches on this type of scenario in the core book. He proposes the solution of recording a ‘last seen date’ and then running a business rule on the satellite at an appropriate interval to add a ‘considered to be ended’ date in lieu of a definite end date.
If you are very sure of your source, you could implement this as “If an item hasn’t been seen today, give it an end date of today”. In case of reporting error, you could implement this as “If an item hasn’t been seen for a week, give it the end date that it was last seen”
Expanding on @will.list 's answer, Dan indeed does cover “Last Seen Date” in his Building a Scalable Data Warehouse with Data Vault 2.0 book, starting at page 100 in the section titled 220.127.116.11 Last Seen Date. Chapter 11 of the same book discusses how to populate these dates.
The section mentions that this can be done when the
Do you receive a full load each time from the source system?
Last_Seen_Dates are deprecated as of the DV2.0.2, although they would be handy on Snowflake https://patrickcuba.medium.com/data-vault-agility-on-snowflake-747a04485499
Thank you, @patrickcuba, for the update on the last seen date deprecation. Naturally, I decided to do a little digging myself on this as well.
I found an interesting post from Dan that references last seen dates and load end dates, calling them dead because they involve updates in the data vault.
To implement a 100% insert only solution instead, he recommends:
Section 5.3.5 of Dan and Michael’s book (Building a Scalable Data Warehouse with Data Vault 2.0, pg. 146-149) discusses record source tracking. The beginning of chapter 6 discusses PIT tables.
Yea because they’re expensive — my article above has a link to the standards themselves DV2.0.2