How to merge the same entity from another sources

Hi. Couldn’t find an answer in other threads, would appreciate advice and help.

I am starting to update my data model using Data Vault approach and dbtvault. I am facing a situation where the same business entity (tables wallets, transactions) is in different sources and contains different data for a user (table user).

What should I do in such a situation?
If the entity is the same but different sources, it seems logical to merge them into one HUB with specifying the source in the RECORD_SOURCE field. And attributes should be separated into different satellites. But at what stage should the merge take place, raw_stage or stage and how to be if different set of fields. I create one table with all fields and fill them depending on the source?

Thanks

Depends on the tool… AutomateDV does a UNION ALL across sources, I’d love for that to be independent inserts into the hub, first in best dressed.

Like I discuss here. https://www.snowflake.com/en/blog/data-vault-techinques-hub-locking/

Hi Red,

When you are asking “when should the merge take place” are you referring to loading the HUB or are you referring to being able to combine information from the various source specific SATs for each source HUB key (same business entity but separate HUB Keys).

If it is the latter then you need a Same As Link table to provide the relationship between the HUB keys.

Thanks,

Carl

Hi!

Thanks for the question. Due to how dbt works, the built-in materialisations require 1-1 source to target tables i.e. 1 model can feed 1 source unless you do a UNION/UNION ALL, (mentioned by Patrick above) in the SQL. Please see the docs here.

This does mean you have to load them all at once or not at all, instead of independently as is ideal.

Just recently I saw a customer maintain multiple repos to ensure they do not use UNION ALL; they also changed their hub loader into a MERGE for Snowflake so as to lock the hub table (Data Vault Techniques on Snowflake: Hub Locking). Very very strict naming standards to make it work (to me that alone could be problematic - if that engineer leaves then it could all fall apart). Another associate said she uses dbt’s Aliases as I had described here (What does dbt give you?. Data build tool (dbt) is an open-source… | by Patrick Cuba | Medium) to get around this limitation.

Would love to see this limitation go away – but I understand it is dbt bound!

Is this a worthy successor? https://www.sdf.com/