Two or more dependent children in a LINK?

Hi all,

We have two dependent children (subclass and class) that do not have any meaning by themselves. We are thinking of adding them to the LINK as follows:

Question: Is this the correct way to hang two satellites from multiple dependent children in a LINK?

Thanks,
Saqib

@saqib - I am really curious about this one, and I’m not sure I fully understand the situation here.

Questions I have regarding this:

  • Is it possible that CLASS and SUBCLASS would qualify as reference tables?
  • Are you able to provide a sample with data?

I’m very interested in people’s responses here…

Me too! Generally speaking a Reference table does not have hash keys, using dependent child keys is synonymous with degenerate dimensions under Kimball

Hi @Nicruzer and @patrickcuba ,

The data looks as follows:

ITEM_ID CLASS SUBCLASS DEPT
aaa 14 1 100
bbb 14 1 100
ccc 14 1 100
ddd 16 1 100
eee 16 1 100
fff 16 1 200
DEPT CLASS CLASS_DESC Attr1 Attr2 Attr3
100 14 SEDANS nnn nnn nnn
100 16 WOMEN SHOES nnn nnn nnn
200 16 NOTEBOOKS nnn nnn nnn
DEPT CLASS SUBCLASS SUBCLASS_DESC
100 14 1 VEHICLES
100 16 1 SHOES
200 16 1 COMPUTERS

Note that the CLASS and SUBCLASS only hold meaning when combined with DEPT. This is the reason I was thinking of hanging the two SATs (SAT_ITEM_DEPT_CLASS and SAT_ITEM_DEPT_SUBCLASS) from the LINK (LINK_ITEM_DEPT) table.

Thanks,
Saqib

They look like REFs.

If you refer to the book I do mention how to treat ref data and codes,

  • Ref data is provided as autonomous tables & how to apply them to your data
  • Ref data is provided already resolved from source (no need to apply them)

Can a subclass be joined to other classes?

The way you have modelled it suggests that it does, I am betting it doesn’t and therefore modelling it this way breaks the “Unit of Work”. A subclass belong to a class and no other class (right?).

Again, ref data does not need a hashkey, I think they should be ref-code look ups, whether it is in the link or is in the sat is up to you and the business as you understand the data then I do.

For a relationship can the class+subclass change? How would you then model that?
You could consider a Link-Sat holding those ref-code lookups instead. That way the ref-codes could change for a relationship and the model naturally handles that change… because thats what a sat is built for.

Hi @patrickcuba . Is that section 6.4 in the book?

Thanks,
Saqib

Hi @patrickcuba , if we define the SUBCLASS and CLASS as reference tables, how will we keep a audit history of the changes to attributes on those tables?

Ref tables can be historised too, doesn’t mean it must be a satellite table

ah and I’m not sure in which chapter/ section i’ve got info on ref tables!

Yes. It is from Patrick’s book section 6.4. We have a bunch of reference tables. Per Patrick’s book, we defined a reference table as whatever falls under the lookup category. In our case, we track the data from the source system more of change data capture. So I have all the history if I need to but for consumption on Infomart, we just have the latest snapshot reference table. Per Patrick’s book, you can do it as a Hub reference and Hub reference satellite and join it for reporting but for us, reference tables did not categorize as a Hub per business modelling so ended up being reference tables mostly as I said above as lookups.

Thanks @spachunuri and @patrickcuba . CDC is one way, but if we don’t want to depend on the CDC, what is the best approach?

What do the Hub Reference and Hub Reference Satellites look like?

Thanks,
Saqib

yea… you can go with the hub_ref sat_ref approach or simply build sat_refs with natural keys (no hashes) and in that way you limit the number of join tables

hub_refs would be in a sense centralise ref-codes from multiple sources and perhaps build a BV to consolidate it… it’s an approach but you’ll need to balance it with your requirements. For the simplest approach (KISS), refs with change tracking should be fine!

Can i ask how you are getting data from the source system now? and which data warehouse you are using?

Hi @spachunuri , we are using Fivetran with Snowflake.

Thanks,
Saqib

Saqib, there seems to be a bit of confusion regarding this book or that book. To stay DV2 compliant and to reduce confusion, my recommendation is that you stay with the DV2 standards as published by Dan Linstedt. I’m not familiar with the referenced materials, so I cannot attest to whether they are DV2 compliant or not - unless your are referring to your CDVP2 course materials, “Building a Scalable Data Warehouse with Data Vault 2.0”, or any of Dan Linstedt’s blogs and posts.
As it relates to reference data, keep in mind that the team has to determine where to draw the line between what is reference data and what is operational/transactional data. At first glance, and based upon your description, you could treat the relationship as a link with two dependent child keys and hang the descriptive data (Attr1, Attr2, Attr3, etc.) in the link satellite to capture history. The question is, does the class/subclass description change over time? When looking at this data from a reference data perspective, are the class and subclass identifiers buried across your enterprise operational systems as descriptive codes? If they are, then that should influence how the team chooses to design this data. Hopefully this is helpful for clarification.

1 Like

Hi @cmeyersohn ,

Thanks for detailed response. This clears up the confusion I had earlier. Both subclass and class have attributes that can change over time, so we will go ahead and hang it as a SAT from the LINK.

Thanks again,
Saqib