Different data modelling techniques suite different architectures. If your data architecture does not support ANSI compliant SQL semantics on a relational database, you should probably consider something other than data vault ā and yes, throwing Spark at something doesnāt make the data storage architecture relational, it makes it a band aid, a very expensive one.
Why separate them? Easy. Like the design of Kimball models the design of DV models must take advantage of the OLAP platformās algorithms underneath the hood, to name a few:
- Nested loops
- Hash-joins (build and probe)
- Data sketches like bloom filters
The hash join is important, itās what makes joining facts and dimensions so performant, and when I see a customer build āPIT viewsā I know they are missing this lesson entirely.
We also preach that your data complexities should be shifted as far left as possible, the theme of this article, The OBT Fallacy & Popcorn Analytics | by Patrick Cuba | The Modern Scientist | Medium
It is not documented anywhere that you shouldnāt have business keys in satellite tables, I encourage it. Why do you need to join to anything if you do not need to?
Why have those separate tables?
- Audibility
- Isolating PII
- Tracking true changes
- Solving model / data complexities upfront so analysts donāt need to
Coincidently many of these topics is exactly what I discuss here: Data Vault is Information Mapping | by Patrick Cuba | The Modern Scientist | Mar, 2025 | Medium
And very doable using Iceberg too, Data Vault on Snowflake and Apache Iceberg | by Patrick Cuba | The Modern Scientist | Medium
Why do I blog these? Because it helps with consultation with customers. How many times when youāre explaining these concepts to an audience and at the end of the call they ask a question you already answered at the beginning of the call! Cognitive load! Sometimes, whilst consulting I might even say, āah, I have a blog for that, here you go!ā
We always recommend training and coaching but I think it is a cheat to simply say, āyouāre doing it wrong because you never had trainingā or worse, āpay me money so I can show you how to do it.ā Let me tell you, even trainers get it wrong. No where in the training material do they tell you why the structures work as tables, they just tell you to do it. Or worse, stating that there are problem statements like ālink-satellites are deadā without providing any evidence of the problems they encountered. Or spend pages discussing the ācolours of DVā ā what is the value in that?
I have been trained and certified on both Data Vault variances, so I can see plainly when Iām consulting which DV a modeller models just by looking at it.
It is not helpful going round and stating that you have seen many failed data vaults when the common denominator is yourself, Iām sure you have seen success in your hook (or even share when it has not worked). I wonāt defend data vault because I know it is not perfect and I know where and (a lot of the times) why pitfalls occur. A common issue I see is, āwe want to build a data vaultā, I ask āwhyā. I would be happy to say āno you shouldnātā and there can be a host of reasons,
- Your architecture does not support relational semantics
- Your team lacks the maturity needed - technical or even business
- Youāre not willing to listen to advice I am giving, you hear it, but youāre not following it āand I never say āI told you soā later, why?
- It is led by or owned by data engineers, their focus is on automation, not data modelling.
Check out the comment section of this article, The Death of Hash Keys. An innovation introduced in data vaultā¦ | by Patrick Cuba | The Modern Scientist | Feb, 2025 | Medium
I donāt understand why people are still trying to import hash-keys into PowerBI or Tableau!
Where I have seen it work and work quite well is when customers have a clear vision of what their business architecture looks like. That youāre not building a DV as just another modelling technique but as paradigm shift to thinking of structuring data around business needs. i.e. marrying Enterprise Architecture discipline with how the data is structured.
The pitfall I see here is employee churn and knowledge retention, enterprises must invest in the discipline to continue following principles established on how they will build their data vault, trust me, thereās even variances within the two dominant DVs in the wild!
Have you posted your question to Danās forum yet?