If I want to model transactions as regular links storing only real business entites relationships, and put all other information together with transaction’s unique id in satellite, like proposed in Data Vault on Snowflake: Expanding to Dimensional Models | by Patrick Cuba | Medium , can I still use DCK (transaction_id) for joining to other objects containing other piece of data for same transactions, coming from different files ?
What do you need to join it to? Other transactions?
Not other, but same transaction - additional data for same transaction (with same transaction id) coming from different source file. This id is meaningless as you said and not used in any other context, but to join these pieces of information which is coming within different files.
Problem is - main file consists of real entities involved in transaction, what could logically go into regular link, with transaction satellite holding transaction_id as DCK + other descriptive data. But in the other files, we will have only transaction_id + some measures like volumes, amounts etc. So this cant go to additional transaction sat connected to mentioned link, cause there are no business keys in it so no way to append it to link record.
So, it can go into separate hub with transaction_id as hub, which is not really recommended… or have this 2 files joined in staging layer prior to raw data vault loading, which is also not recommended.
Why can’t the transaction details in the 2nd file (Volumes, amounts, etc) be added to the LINK_SAT_Transaction satellite? If these are attributes of the Transaction then maybe they could?
Does the 2nd file hold the transaction date multiple times with different values for volumes, amounts, etc? Is there another granularity below transaction ID?
If you can’t add to the 1st LINK_SAT then can you create a 2nd LINK_SAT off the link to hold the information from the 2nd file? Maybe a Multi-active satellite or something?