Using external load-dates

Hi all,

I’m trying to load historical data in to a data vault warehouse, and we’re considering using a source-system effective-from date as a load date. How could I go about this without breaking the load patterns?

In Data Vault 2.0 Load Datetime has a standard meaning. It is the datetime the record was loaded to the Data Warehouse. If you are batch loading you can set it to the batch datetime, or to the datetime the insert query is run.

Load Datetime is therefore always set in time sequence. You cannot receive an ‘out of sequence’ load datetime.

This notwithstanding - if you are loading historical data as a one off initialisation of data then, if a historical load datetime is not available, you can use an effective or applied date value for that historical data set.

For historical loads - HUBS and LINKS should take the earliest date stamped entry for each unique PK. Non-historised links just load deduped data. SATS should attempt to iterate through their history to record when values changed accurately and preserve their hashdiffs (assuming you have the full history of data feeds). Our dbtvault will calculate and do this iteration for you (iterate by rank materialisation).

4 Likes

Hi Neil, this is great. Is this an example of what we will cover in Feburary?

I’m certainly looking forward to February to see if topics like this come up, as they are a common issue that data vault is called upon to cover. Patrick did an excellent series on ‘time crime’ that is often referenced.

In my experience, the standard of ‘LoadDate’ absolutely needs to be there, but suffers from the overloading of the term. Is it the date the load ran (essentially a load id), the date the load was received into the data warehouse (which is usually pretty close to the date the load ran), or the date the record was applied to that table (which might be quite far away from the start of the run)?

In your case, a fourth date field would be needed - a date at which the record should be considered to be loaded. This could then be treated as the ‘load date’ in downstream rules where you would normally use actual load date.

1 Like

The February 2022 meetup will look at time data on steroids. There are multiple timelines involved in extracting and loading data and all of them could be tracked separately as metadata. I’m looking forwards to it too.

2 Likes