Joining a project and finding a half built nightmare

Frankie · 21 May 2025 08:29

Hi all,

I’ve always wondered how different people approach coming into a project that’s clearly been on a rocky path for a while. So say you’ve joined a new role, they’ve got their bespoke data vault that they’re all very proud of but you take one look at and can tell somethings amiss. There are some instances of pattern breaking but most of the time it’s just strange decisions that have become a kind of convention for them. It’s likely hindering their performance in the long run but the code base is mature to the point that it might be a significant investment to try and go back and change things.

So what’s your approach?
Accept it warts and all?
Scrap it and start over?
Nudge people to do it right from now or are the conventions more useful to make the code base maintainable?

Interested to hear peoples thoughts!

PS: If my employers are reading this, this does not represent our work! Just a thought experiment from other vaults it has definitely applied to

squash7733 · 6 June 2025 07:33

Thanks for raising this important issue, Frankie.

Without revealing the client for Privacy & Compliance reasons, I would like to share my experience when building a DV 2.0 for the Case Management Subject Area. When I walked into the project, the team had already modelled the SA, using RV_HUB_CASE & associated SAT’s. A customer Case. such as for Insurance claim, has associated entities/activities, such as Work_Party handling the Case, Triage of the Case that change during the Case’s lifecycle. But instead of modelling, for instance, Work_Party as a separate RV_HUB_WORK_PARTY, the team had overloaded all these entities into RV_HUB_CASE with its SAT !!! So the RV_HUB_CASE had multiple SAT’s - keeping Triage, Work_Party etc data!!

When I queried the lead modeler, he said it would not matter as long as their BK’s and hence the Hub-HashKey’s did not collide within RV_HUB_CASE. He assured me they would not since their domains were different in the source!

But, of course, the above modelling was semantically wrong besides confusing the hell out of the developers working with the RV tables.

Since almost 70-80% of the modelling & ETL were already done, the project team - as you pointed out - thought it would “be a significant investment to try and go back and change things”. And they were adamant particularly since they were on a fixed price contract and would have to bear any rework costs.

The way I resolved this is by unpacking the overloaded RV_HUB_CASE in the Business Vault into several BV_HUB_TRIAGE, BV_HUB_WORK_PARTY, etc. We still had their SAT data in the RV. It was the best resolution that I could think of at that time.

Any alternate suggestions from this DV Group?

volkernuernberg · 12 June 2025 09:34

I like your suggestion, @squash7733, and it also aligns with an approach where a source object (table) contains multiple business objects. (but not applicable as a proper approach in the example you gave!)

Regarding the underlying question:

If I join a project as a DV2 expert and see deviations from the standard, I name them, document possible consequences or risks, and outline alternative solutions.

Ultimately, the risks have a price, and the rework has a price. The product owner decides. They are responsible for direct costs and risk costs.

patrickcuba · 12 June 2025 11:06

Step 1: Establish principles & goals, what you see and where you think it should go
Step 2: Articulate where the project is and where the gaps are, by extension, what is the future-proof value of following a pragmatic, business architecture focus of a data vault
Step 3: IF/WHEN project sponsors are onboard ensure the right people are onboarded
Step 4: Emphasise the cost vs benefit of following a proper DV2.0 approach, i.e. the focus is on integration around business capabilities (by extension business objects)
Step 5: Pilot a POC as a Steel Thread, in parallel establish standards and templates for all stages of the data modelling and data engineering life cycle

Baseline what is expected at each delivery, ALL data must integrate by business key
Accept that not all use cases will go through the vault, but if business cases want to inherit the maturity established by business cases served by going through the vault then you must integrate
Ensure ownership, at some level it is a data contract particularly on hub table integration, these are afterall your business object representations. Hub tables serve as the integration across an enterprise’s software landscape AND between business view and actual automation of business rules. This is why hub tables are vital and why ‘core business concepts’ are flakey and establish nothing. Links by extension are the authorised ‘paths’ between business objects recorded as business events, transactions and relationships.
Embellish on the benefits of shift left data modelling, i.e solving polyglot source model complexity once and upfront in the data vault model benefits all that querying that data because you pay that penalty upfront, and most importantly once — and not at every query.

Topic		Replies	Views
Starting on Datavault 2.0 Data Vault 2.0	4	247	15 May 2024
Introduce yourselves! Watercooler community	44	1084	17 March 2023
Is data vault the right choice? Data Vault 2.0	2	243	4 August 2024
Is Data Vault a good fit? Data Vault 2.0	1	571	5 February 2024
Reference resource for Data Mesh implementation Data Vault 2.0 data-mesh	2	318	5 October 2023

Joining a project and finding a half built nightmare

Related topics