We are doing a POC to refactor our data assets using DV 2.0 methodology and having a few questions about RV layer.
Why do we need record_source column in Satellite tables if they are already source specific Satellite tables ?
Why do we need as many satellite tables as that of the source tables? Can we combine related data together in Satellites from one source with data coming from multiple tables?
Thanks in advance !
what would you consider as record-source value?
We have member data coming from different source systems and we are using pattern SAT_Meaningful name_src for satellite tables.
For example SRC1, table name is SAT_MEMBER_SRC1 and SRC2, its SAT_MEMBER_SRC2. So, values for record_source column would be ‘SRC1’ for 1st SAT table and ‘SRC2’ for 2nd SAT table. My concern is, we already have source specific suffix within a table name, so is it okay to drop ‘record_source’ column from satellite tables or its good to have it?
Ah but you can be doing a lot more with rec-source. If the source file landed for you is the captured output of a business rule/process (which it is) then I’d add a lot more. Some of this might be available as a metadata column already, so your rec-src could look like this:
What’s more this column could be used to denote business rule version if the rule evolves and is the record itself is impacted by article 17 of gdpr
I’d never combine RAW Vault satellite tables, if one satellite table evolves and the other doesn’t
how do you then deal with an evolving satellite source and the other source hasn’t?
loading to a satellite table is hash_key + hashdiff change based, is the one source now comparing against content from another source in the same satellite, or do you now add rec-source in the comparison (only update if it is the same record source is ‘X’)
The complexity added by combining sources to one raw satellite table is just not worth it.
Keep the data vault simple and it will last longer, combining sources to a raw satellite = tech debt.