Implementing DV seems quite a big and complex effort. So we currently think - without an Automation Tool (e.g. DataVault Builder, WhereScape etc.) the implementation is quite hard. What’s your opinion?
(This is one of the questions that was asked during our last meetup "5 most common challenges with Data Vault modelling on the 12th Jan 2022 - by one of the members)
implementing DV is not really “big and complex effort” IMHO. Yes, as it is heavily standarized in terms of patterns of loading, usage etc, it certainly lends itself for automation in pretty much every way you can think of. and this is a good thing ™
But, it does not make it complex really. If you’ve done Kimball for a few years my guess would you can already do the basic outline of a star schema model looking at a source system in your mind? Granted, the image you have wont be the end-result nor perfect, but its hard not to see no? Same with DV. in fact, I’d say be careful once you do start seeing a break down of a source table into DV, as DV should not be source-centric. Integration around CBC is what you want.
But again, not really complicated per se.
and even without a 3rd party tool, some upfront considerations in terms of naming conventions, loading patterns, temporality/timelines, audit/metric storage etc, and you can certainly 'hand craft" it - using various templates to speed the process as a bare mininum I’d say, and some consistency checks of your schema model to ensure keys/hashes/meta data columns (load date, load source etc) all conform…
We’ve been running a DV without automation for five years or so I agree that it is not really a big effort. I expect it depends on what platform you are going to use. For us using MS SQL/SSIS, we just used templates provided with the Dan L book. Implementation is then really just a case of cut-and-pasting the new code with tweaks.
Having said that, we have found that some standards are starting to slip in the code as both the team and our company data model is evolving. As such, we are now looking to move to automation.
I’ve implemented DV without automation, and to echo the other sentiment here, it’s not that hard. The templates are simple and repeatable, they really are a cut & paste job for a halfway competent developer.
There are two counters to that:
- Do you want your halfway competent developer to be cut & pasting boilerplate code for their wage, or do you want them using their skills to write the business rules or complex joins? What would they rather be doing?
- It takes about 1 day to kick out all the templates of loading a single source item into the vault - including testing and “general project admin”. Think about your sources. You probably have at least 30 tables or spreadsheets or other single items. That’s a long time to wait whilst the business is asking why haven’t you implemented this yet/ where are my “disrupting innovative business insights”?
If you are going with DV without automation, you need to make very sure the business is OK with paying the manpower cost of doing it, and not OK with paying the automation cost. That being said, a lot of businesses seem to be more OK with paying two developers 45k a year to write boilerplate slowly than 20k a year for a robot to write boilerplate fast, plus 60k a year for someone to run the robot.
We are currently in the throes of implementing DV2 by building our own automation “tool” in Azure Synapse. Add to that the fact that we learned (are still learning) DV2 while simultaneously building the solution. Optimal? No way. Doable? Sure, but not without pain and error along the way.
Despite the challenges, I am extremely grateful for the experience. I have learned quite a bit using this getting-thrown-out-of-the-nest approach, and my appreciation and passion for DV2 has grown immensely.
I believe implementing DV2 isn’t really all that complicated or difficult as long as you consistently apply the proven patterns correctly. I can imagine, though, that it would become monotonous and boring along the way because the process is so incredibly repeatable, like an assembly line.
The greater challenge is to correctly build the robots for that assembly line. Once that is done, you have room to add components, bells and whistles for DV Robot 2.0.