Creating Multiple Status Tracking Satellites associated to the same hub

pdoortje · 20 October 2023 11:15

Hello Data Vault User Group,

I have a modelling question and would appreciate your insights and advice.

I am currently working on a project involving three source tables : ‘Inquiry’, ‘Inquiry_characteristic’ and ‘Inquiry_Period’. All of which are snapshot based.

The core data is extracted from the ‘Inquiry’ table, from which I have created a hub_Inquiry and a regular satellite S_Inquiry.

The second table ‘Inquiry_Characteristic’ contains multiple ‘Characteristic_code’ values per Inquiry. To facilitate pivoting and analysis of these attributes over time, I’ve opted for a multi-active satellite MAS_Inquiry_Characteristic

The third table , ‘Inquiry_Period’ contains historical data regarding the period of investigation. I’ve treated this as a satellite with a dependent child. S_Inquiry_Period

Given that deletions can occur in each table
Is it a good way to create three distinct status tracking satellites associated to the hub H_Inquiry? SS_Inquiry, SS_Inquiry_Characteristic and SS_Inquiry_Period
If not, how can I track deletions for the multi-active satellite and the satellite with a dependent child?

Thank you in advance for your input

AHenning · 22 October 2023 18:33

Yes, it is vital to create either status tracking satellite or record tracking satellite, otherwise you wont be able to recreate the data sent to the dv. If you cannot recreate the data you have data loss and that is not acceptable. You can extract the dependent child fields and create a peg link. If you do that you dont have to create a multiactive satellite, instead create a regular satellite hanging of the peglink. Tracking gets very simple if you avoid multiactive satellites. My personal preference is to avoid multi active satellite.

patrickcuba · 23 October 2023 00:52

That depends,

Do you need it? It is an additional satellite you need to manage and STS includes secondary staging to infer deletion.
Can the source supply it? This ultimately much better because the source is the source of truth and therefore you’ll guarantee matching to source and less work for you
If you do need them then yes, an STS per source file/table — you’re eseentially what is happening at the source. If you add another source going to the same hub then it too can have its own STS

AHenning · 23 October 2023 16:31

Why do you need a secondary staging to infer deletion?

Regarding wheter the source should supply it, it forces the source to create logic, that is a risk. It is much easier to let the source delivers what it has instead of telling the source what rules to implement. Deltas of existance can be derived when loading the vault.

patrickcuba · 23 October 2023 19:20

Similar to EFS you need to infer a fact
And no, if you get the source to send you a deletion fact then the source remains the SVOT, less cost to you
For this, and many more lessons on how to build a data vault, refer to this

AHenning · 23 October 2023 20:12

You are not answering my question. Why do you need a secondary staging to infer deletion?

And yes, if you have generated code on the dv side that handles deletion, the source does not have to develop the deletion code. Less cost for everyone. Low risk for incorrect data.

FrenkLoonen · 27 October 2023 14:08

I’ve implemented multiple STSs on one Hub as well. Basically, as my setup (like yours) was snapshot based too, and I knew for a fact that physical deletes in the sources were more of a rule than an exception, I decided to just make it a habit to create an STS for every source table.

Which meant:
Satellite+STS
MAS+STS (where the STS would be on the grain of the hub). The MAS pattern takes care of tracking deletes of dependent childs within a set, the STS will track deletes of an entire set.

The only thing I never found a nice solution for is how to track deletes in Sats with a DC. It’s also a pattern I didn’t use because it wasn’t supported by AutomateDV. Since this pattern isn’t set-based, you would have to check on the grain of the dependent child if it’s deleted or not. But you’re not going to create an STS with the same DC in the PK, because in your information mart queries you would have to then join your DC-SAT on your DC-STS on both your hashkey and the DC and that’s not something you want to do.

You could, as AHenning suggested, implement a peg-legged link instead of a DC-SAT, then you’ll be able to properly hang an STS off of that link. But a peg-legged link is not a DV2.0 pattern. So that’s something you’d need to see for yourself

pdoortje · 30 October 2023 10:03

Thank you all for taking the time to respond.

The MAS-satellite combined with the STS meets my requirements. Based on @patrickcuba’s load-code for a Multi-active satellite (and status-tracking satellite) loading the satellites is relatively easy. The process of getting the data out (in my case pivoting by Parent Hub first) is also smooth.

However, the satellite with the dependent child does not produce the desired outcome. As @FrenkLoonen mentioned, it is not possible to track the deletes of a dependent child record based on a STS. (The satellite with a dependent child can – according to me and in this case – only be used if the delete status is provided by the source, which is not the case.)

If we can’t use satellites with a dependent child for this purpose, and if @AHenning suggestion of using a peg-legged link with an associated STS isn’t considered a best practice…
What would be the recommended modeling solution ?

FrenkLoonen · 30 October 2023 22:13

I’m not sure if, from a pure text-book perspective, the peg-legged link can be considered more of an anti-pattern than the DC-STS in itself. It’s just that, in order to use the information in the DC-STS, you would always have to join it back to the DC-SAT. And this would be the one place where you would need to create an additional pattern, next to the standard SAT-HUB join based on the hashkey… and that’s where it starts to get ugly (perhaps equally ugly as the peg-legged link :)). It will become harder to generate all code.

I’m guessing you’re not using AutomateDV, meaning you’re using your own set of DV patterns and that you already have a pattern for an STS as well as a DC-SAT (I was using AutomateDV and I actually also created my own STS because it doesn’t exist in AutomateDV). You could create the DC-STS pattern yourself and just go with it.

Then again… you could say the same for the peg-legged link

The answer to your million dollar question: it depends (I’ll get my coat)

AHenning · 1 November 2023 06:21

It is very harsch to say that the peg link is an anti pattern. It became “bad practise” over a night, it was a succesful pattern of dv1. I really dont know why it was considered bad practice, the benefits, compared to having the dep child in a satellite are:

Stores less data.
Inserts data faster.
Extracts data faster.
Avoids MAS.
Easier to implement.

FrenkLoonen · 3 November 2023 11:00

Maybe I exaggerated a bit, yes

Topic		Replies	Views
Modelling Question on DV2.0 Data Vault 2.0	27	1894	14 December 2023
Tracking deletes in a regular satellite Data Vault 2.0	9	1436	21 June 2023
Advice for satellite with dependent child keys and hard deletions from source Data Vault 2.0	9	413	11 March 2024
Tracking status for a dependent child data Data Vault 2.0	4	715	8 July 2022
RTS for Sat with dependent child Data Vault 2.0 rts	13	706	1 June 2022

Creating Multiple Status Tracking Satellites associated to the same hub

Related topics