Me too! Generally speaking a Reference table does not have hash keys, using dependent child keys is synonymous with degenerate dimensions under Kimball
Note that the CLASS and SUBCLASS only hold meaning when combined with DEPT. This is the reason I was thinking of hanging the two SATs (SAT_ITEM_DEPT_CLASS and SAT_ITEM_DEPT_SUBCLASS) from the LINK (LINK_ITEM_DEPT) table.
If you refer to the book I do mention how to treat ref data and codes,
Ref data is provided as autonomous tables & how to apply them to your data
Ref data is provided already resolved from source (no need to apply them)
Can a subclass be joined to other classes?
The way you have modelled it suggests that it does, I am betting it doesn’t and therefore modelling it this way breaks the “Unit of Work”. A subclass belong to a class and no other class (right?).
Again, ref data does not need a hashkey, I think they should be ref-code look ups, whether it is in the link or is in the sat is up to you and the business as you understand the data then I do.
For a relationship can the class+subclass change? How would you then model that?
You could consider a Link-Sat holding those ref-code lookups instead. That way the ref-codes could change for a relationship and the model naturally handles that change… because thats what a sat is built for.
Hi @patrickcuba , if we define the SUBCLASS and CLASS as reference tables, how will we keep a audit history of the changes to attributes on those tables?
Yes. It is from Patrick’s book section 6.4. We have a bunch of reference tables. Per Patrick’s book, we defined a reference table as whatever falls under the lookup category. In our case, we track the data from the source system more of change data capture. So I have all the history if I need to but for consumption on Infomart, we just have the latest snapshot reference table. Per Patrick’s book, you can do it as a Hub reference and Hub reference satellite and join it for reporting but for us, reference tables did not categorize as a Hub per business modelling so ended up being reference tables mostly as I said above as lookups.
yea… you can go with the hub_ref sat_ref approach or simply build sat_refs with natural keys (no hashes) and in that way you limit the number of join tables
hub_refs would be in a sense centralise ref-codes from multiple sources and perhaps build a BV to consolidate it… it’s an approach but you’ll need to balance it with your requirements. For the simplest approach (KISS), refs with change tracking should be fine!
Saqib, there seems to be a bit of confusion regarding this book or that book. To stay DV2 compliant and to reduce confusion, my recommendation is that you stay with the DV2 standards as published by Dan Linstedt. I’m not familiar with the referenced materials, so I cannot attest to whether they are DV2 compliant or not - unless your are referring to your CDVP2 course materials, “Building a Scalable Data Warehouse with Data Vault 2.0”, or any of Dan Linstedt’s blogs and posts.
As it relates to reference data, keep in mind that the team has to determine where to draw the line between what is reference data and what is operational/transactional data. At first glance, and based upon your description, you could treat the relationship as a link with two dependent child keys and hang the descriptive data (Attr1, Attr2, Attr3, etc.) in the link satellite to capture history. The question is, does the class/subclass description change over time? When looking at this data from a reference data perspective, are the class and subclass identifiers buried across your enterprise operational systems as descriptive codes? If they are, then that should influence how the team chooses to design this data. Hopefully this is helpful for clarification.
Thanks for detailed response. This clears up the confusion I had earlier. Both subclass and class have attributes that can change over time, so we will go ahead and hang it as a SAT from the LINK.