Hi communty,
you can read it in several posts all over the internet. Don’t build a Source Data Vault!
I wonder if there is a definition for a Source DV?
I would assume this term comes from the idea of building Business onthologies in a first step and then model your DV2 according to your onthology.
To me this seems to be more related to the methodology of DV (without 2) or when you build an EDWH.
I remember when I had my DV2 training in 2017 M. Olschimke, ( the Co author of the book with the fast car) told us, that building up the Raw vault is easy and can be automated in 90 % of the time. I also follow the approach to load a single stage table independently of other tables from source to the raw vault entities.
(if I do not misinterpret it, also Dan says that in his DV Modeling Specs. Chapter 2.5 Staging tables.
[https://danlinstedt.com/wp-content/uploads/2018/06/DVModelingSpecs2-0-1.pdf] )
→ so I would expect this means I have to be as close to the source as possible?
Otherwise,if I had to make sure my Raw Vault is more Business Onthology specific: I guess I’d have to join and transform data before I load it to the Raw Vault, but this would complicate the loading algorithm and would not scale in the end?!
Furthermore, Patrick Cuba says:
After all you do not purchase a tool like Salesforce for its logo! You purchase Salesforce because it mostly fulfills your business processes, the data output of which is captured into Raw Vault , the gaps modelled into Business Vault !
[You might be doing #datavault Wrong! | by Patrick Cuba | Medium]
→ To me that sounds like most source systems are very close to the Business processes anyway and thus it would make sense to stay close to these source tables?
I also have the problem that Business usually has reporting needs that are closelly related to the source systems but cannot be fulfilled on the source systems themselves (eg Salesforce, but also SAP ), maybe because of missing history or whatever reason.
So my approach would be:
1, define together with Business “good” Businesskeys and Hubs → passive integration should be fulfilled if possible at all!
2, Load for your use case each needed source table to the Raw Vault as close as possible to the Source
e.g: SAP environment, concept CUSTOMER:
- HUB_CUSTOMER: (filled by KNA1 table, which is the master table for customer data; unfortunatelly a proper BK is hard to find as the NAME FIELDS might not be unique, at least in my case it means, use the KUNNR , which is the customer number, but that seems ok as also Business refers to their customers by using that number)
- SAT_CUSTOMER_KNA1_SAPSYSTEM1: Load all columns from KNA1 into that SAT (and for convenience reasons I even add the name of the source table to my SAT.
- SAT_CUSTOMER_KNA1_SAPSYSTEM2: same as above (but might have different fields than the first SAT)
- …
So finally, is that a Source Vault or not?
I’d really be interested in other thoughts!?
Thanks a lot
Klaus