I am currently modelling data vault 2.0 tables for one of our sources and I’m a bit stuck when it comes to handle validity dates.
What the source delivers us, is the relationship between persons and certificates. Each person can have multiple certificates (Person A has a Data Vault Practioner and the Data Vault Architect certificate). Each of those certificates does come with a ValidFrom (date the certificate was obtained) and ValidTo date (expiry date) and some additional descriptive columns (content, issuer etc)
The obvious choice is to model it as a link between the person and certificate hub. However, I’m unsure about how to store the validity dates.
Reading the Data Vault Guru by Patrick Cuba it says the following:
Don’t mix with the definition of a source-supplied effectivity (sat_mdm_party_address) with data vault’s own effectivity satellite, their purpose is the same but an effectivity satellite is the result of the execution of business rules staged through the identification of a driving key; rather than the source already supplying the business rule output in raw vault. Effectivity supplied by the source is merely hashed into a record hash (hashdiff) column and loaded like any other satellite load.
Building a scalable data warehouse from Linstedt and Olschimke states the following:
The begin and end dates are not system generated. Instead, they have to be provided from a data source, for example the audit trail of Microsoft Master Data Services, effectivity dates within the master data, change data capture (CDC) or any other audit trail from operational systems. In order to pass an audit of the data warehouse, it must be possible to trace back the dates to the source system. Effectivity satellites are only reasonable if the source system provides the effectivity dates.
I have the feeling both sources are contradicting each other. While one is saying that source provided business effectivity dates are loaded into the link satellite, the other one says if it needs to go into a seperate effectivity satellite. Which approach is more reasonable?