Hello Data Vault User Group,
I have a modelling question and would appreciate your insights and advice.
I am currently working on a project involving three source tables : ‘Inquiry’, ‘Inquiry_characteristic’ and ‘Inquiry_Period’. All of which are snapshot based.
The core data is extracted from the ‘Inquiry’ table, from which I have created a hub_Inquiry and a regular satellite S_Inquiry.
The second table ‘Inquiry_Characteristic’ contains multiple ‘Characteristic_code’ values per Inquiry. To facilitate pivoting and analysis of these attributes over time, I’ve opted for a multi-active satellite MAS_Inquiry_Characteristic
The third table , ‘Inquiry_Period’ contains historical data regarding the period of investigation. I’ve treated this as a satellite with a dependent child. S_Inquiry_Period
Given that deletions can occur in each table
Is it a good way to create three distinct status tracking satellites associated to the hub H_Inquiry? SS_Inquiry, SS_Inquiry_Characteristic and SS_Inquiry_Period
If not, how can I track deletions for the multi-active satellite and the satellite with a dependent child?
Thank you in advance for your input
Yes, it is vital to create either status tracking satellite or record tracking satellite, otherwise you wont be able to recreate the data sent to the dv. If you cannot recreate the data you have data loss and that is not acceptable. You can extract the dependent child fields and create a peg link. If you do that you dont have to create a multiactive satellite, instead create a regular satellite hanging of the peglink. Tracking gets very simple if you avoid multiactive satellites. My personal preference is to avoid multi active satellite.
Why do you need a secondary staging to infer deletion?
Regarding wheter the source should supply it, it forces the source to create logic, that is a risk. It is much easier to let the source delivers what it has instead of telling the source what rules to implement. Deltas of existance can be derived when loading the vault.
You are not answering my question. Why do you need a secondary staging to infer deletion?
And yes, if you have generated code on the dv side that handles deletion, the source does not have to develop the deletion code. Less cost for everyone. Low risk for incorrect data.
I’ve implemented multiple STSs on one Hub as well. Basically, as my setup (like yours) was snapshot based too, and I knew for a fact that physical deletes in the sources were more of a rule than an exception, I decided to just make it a habit to create an STS for every source table.
MAS+STS (where the STS would be on the grain of the hub). The MAS pattern takes care of tracking deletes of dependent childs within a set, the STS will track deletes of an entire set.
The only thing I never found a nice solution for is how to track deletes in Sats with a DC. It’s also a pattern I didn’t use because it wasn’t supported by AutomateDV. Since this pattern isn’t set-based, you would have to check on the grain of the dependent child if it’s deleted or not. But you’re not going to create an STS with the same DC in the PK, because in your information mart queries you would have to then join your DC-SAT on your DC-STS on both your hashkey and the DC and that’s not something you want to do.
You could, as AHenning suggested, implement a peg-legged link instead of a DC-SAT, then you’ll be able to properly hang an STS off of that link. But a peg-legged link is not a DV2.0 pattern. So that’s something you’d need to see for yourself
Thank you all for taking the time to respond.
The MAS-satellite combined with the STS meets my requirements. Based on @patrickcuba’s load-code for a Multi-active satellite (and status-tracking satellite) loading the satellites is relatively easy. The process of getting the data out (in my case pivoting by Parent Hub first) is also smooth.
However, the satellite with the dependent child does not produce the desired outcome. As @FrenkLoonen mentioned, it is not possible to track the deletes of a dependent child record based on a STS. (The satellite with a dependent child can – according to me and in this case – only be used if the delete status is provided by the source, which is not the case.)
If we can’t use satellites with a dependent child for this purpose, and if @AHenning suggestion of using a peg-legged link with an associated STS isn’t considered a best practice…
What would be the recommended modeling solution ?
I’m not sure if, from a pure text-book perspective, the peg-legged link can be considered more of an anti-pattern than the DC-STS in itself. It’s just that, in order to use the information in the DC-STS, you would always have to join it back to the DC-SAT. And this would be the one place where you would need to create an additional pattern, next to the standard SAT-HUB join based on the hashkey… and that’s where it starts to get ugly (perhaps equally ugly as the peg-legged link :)). It will become harder to generate all code.
I’m guessing you’re not using AutomateDV, meaning you’re using your own set of DV patterns and that you already have a pattern for an STS as well as a DC-SAT (I was using AutomateDV and I actually also created my own STS because it doesn’t exist in AutomateDV). You could create the DC-STS pattern yourself and just go with it.
Then again… you could say the same for the peg-legged link
The answer to your million dollar question: it depends (I’ll get my coat)
It is very harsch to say that the peg link is an anti pattern. It became “bad practise” over a night, it was a succesful pattern of dv1. I really dont know why it was considered bad practice, the benefits, compared to having the dep child in a satellite are:
- Stores less data.
- Inserts data faster.
- Extracts data faster.
- Avoids MAS.
- Easier to implement.
Maybe I exaggerated a bit, yes