Modelling Question on DV2.0

We are doing a POC using DV 2.0 , Please advise.

1)Can I have only one HUBKEY with dependent child in LINK table?

The reason I was asking is we can have Multiple instance of same policynumber (Policynumber is HUBKEY) in our source system like example below , The POLICYGUID is PK in source system but Policynumber is a businesskey,

In this case ,Is it right practice to implement in DV2.0 to have a LINK table with just one Policy HUBKEY and PolicyGUID as dependent child and SAT table associated with link table.

|PolicyGUID|Policynumber|
|20069AE4-9F4B-473E-BB5F-000023037A66|P1|
|1F64B9D5-74DD-B960-765D-0000E9731009|P1|
|33CF0E52-F3DA-D63E-A699-000328EAFEF4|P1|
|B58875F9-5455-1484-0C3C-000A640A6C71|P1|
|0A8BBA6C-50F4-48AD-B677-000FC8BEBAF2|P2|
|AF00A2C1-6401-4802-83C2-0018CE93C143|P2|

  1. Are Satellite table used to track the history for HUBS or LINK? Can we have dependent child in SATELLITE table and track the history for combination of HUB and dependent child in SAT table

I think if you scroll through the forum you’ll find the answers to your questions :grinning:

These have been asked for sure

I see few posts on Dependent child on LINK tables where they have multiple hash keys Plus Link dependent child, but my case having only one hash key plus dependent child. Do you see its right approach to have one hash key plus dependent child in LINK table.

Nope, never
Peg legged links are not best practice

Thanks @patrickcuba .keeping POLICYGUID in Satellite, ,Are we allowed to keep track history on HUB Key plus POLICYGUID? we are using wherescape for DV which allows only track history either on HUB or LINK, but In this case we need track on POLICYGUID and POLICYNUMBER… Any suggestions would be appreciated.

Thanks

Hello preddy!

  1. Yes you can have the dependent child in a link structure. This will allow you to track history of which dependent child that exists over time. Just connect an effectivity sat to the link.
  2. Tracking history and existance of a business key or combination of business keys needs a hub or a link. There is no other way to track this in dv.
    Please be very careful using wherescape, it is very limited then your models are growing. Please look for alternative to wherescape, otherwise your project wont scale.

Hi Preddy,

Did you ever figure this out?

I’m new to DV and also trying to resolve whether I can use dependent child keys to track uniqueness within a HUB SAT (as opposed to a LINK SAT) so as to avoid using a Pegged leg LINK.

In my scenario we have a WorksOrder which is an instruction from our organisation to a Contractor for works to be completed. This is a business concept and will be tracked in a HUB. There are also Works Order Lines, with Works Order Line IDs (degenerate dimension), which include the types of work required under the Works Order, and I plan to put the Works Order Line ID in the LINK as a dependent child and have a separate LINK SAT tracking any changes at the Line level (e.g a variation to the cost of the line)

The Works Order can have multiple statuses applied which are actually events that occur in the completion of the Works Order (e.g Order issued, Appointment booked, Appointment Changed, operative On site, Works Complete). These Statuses are held in a Status table. Along with a few other fields there is a Works Order ID field, a status ID field and a Status History ID (PK) (plus Status Date, UserID, etc). As a status can be applied more than once and possibly successively (I need to be able to count instances of the statuses) I want to track the unique combination of Works Order ID, Status ID and Status History ID. I could do this in a HUB but I’m not 100% sure this constitutes a business concept on its own (even though these are events).

So, I was thinking of having a HUB for Works Order and then having a HUB SAT which included the HUB_WorksOrder, the Status ID, the Status History ID and Load Date as a composite primary key in the SQL table.

I’m not 100% sure that the Status History table is purely transactional yet (new table) but if changes can occur to the Status History rows then the HUB_WorksOrder, Status ID and Status History ID could be used as a key to track changes over time.

So, can I use the HUB_WorksOrder, Status ID and Status History ID as a key to track uniqueness within a HUB SAT? Or should that concept become a HUB itself? Or, an alternative method?

Any help would be greatly appreciated.

Thanks in advance.

Carl

1 Like

This seems similar to a Purchase Order (PO) with multiple line items (each line being for a different item + quantity) – do I understand that correctly? In this case, I would have one business concept of a Purchase Order (bk = PO Number) and a second business concept of a Product/Item (bk = Item Number). The PO line item would be a link between the 2 business concepts (associating a particular occurrence of a purchase order with a particular item) and a satellite off the link would have the quantity, etc. associated with that one purchase order line item. If you need to track the status of that line item, there would be an effectivity satellite off the link satellite; if only the status of the overall Purchase Order needs to be tracked then the effectivity satellite would be off the Purchase Order hub.

Purchase order => Works Order, Product => types of work, Purchase Order Line Items => Work Order Line.

Hi SheriDH,

Thank you for taking the time to reply to my question.

Yes, you’re right, it is similar to a Purchase Order (PO) with multiple line items.

Yes, I plan to have the following (not full list but illustrates thinking):

HUB_WorksOrder
HUB_Contractor
HUB_ScheduleOfRates (like Product)
LINK_WorksOrderLine - will hold above HUB_Keys and WorkOrderLineID (as a Dependent Child)
SAT_L_WorksOrderLine - will be able to track changes of the attributes against the WorksOrderLine.

In another post, Patrick Cuba recommends not including the Dependent Child into the LINK_HASH_Key within the LINK. So, I suppose my main question relates to how we can use Dependent Child Keys. In order to get the correct information (point-in-time and current) from the SAT_L_WorksOrderLine, into the InfoMart layer, for a particular WorksOrderLine, is it acceptable to partition by the LINK_HASH_Key AND the Dependent Child?

The question of whether we can partition by a HASH_Key AND a Dependent Child relates to my original post regarding the Status History table too. If I have a SAT_WorksOrderStatus table this would have the HUB_WorksOrder_Key but would also include the Dependent Child Keys of ‘Status_History_ID’ and ‘Status_ID’. My experience so far is of only partitioning by HUB_Keys. I suppose my main question is:

“Can i partition by dependent child keys?”

With regard to the Effectivity Satellites, these are not required in this instance as the source already provides the effective dates and also provides information related to whether the lines (WorksOrder and WorksOrderLine) have been deleted or not.

Thanks in advance,

Carl

How about breaking the rules and putting the dependent child in the Link? What is the worst that could happen? :grinning:

Hi AHenning,

Thank you for commenting.

This was the original post that i was referring to previously: [https://forum.ukdatavaultusergroup.co.uk/t/modeling-a-dependent-child/197/9](https://forum.ukdatavaultusergroup.co.uk/t/modeling-a-dependent-child/197/9)

I read the exchange regarding whether to include the dependent child in the LINK_HASH_KEY or not and to be honest I understand why it should be included in the LINK rather than in a HUB_SAT but I’m not sure I fully understand the reasons for why the Dependent Child should or shouldn’t be included in the LINK_HASH_KEY. It seems that it would be easier to partition by in the LINK_SAT if the Dependent Child was included in the LINK_HASH_KEY.

However, whether it is or isn’t included in the LINK_HASH_KEY doesn’t help me with my other scenario regarding the WorksOrder Status History whereby I think i need to create a HUB_SAT but I need to partition on the HUB_WorksOrder_Key as well as the StatusHistoryID and StatusID for uniqueness. The StatusHistoryID provides the uniqueness I require as a WorksOrder (business concept) can have the same ‘status’ multiple times, and in succession, and I need to be able to distinguish these as their own instance. The Status is really an event that has occurred to track the progress of actions against the WorksOrder. So, i need to be able to identify each instance of each status, and, if these rows can be updated i need to be able to determine which are history rows of an instance and which are unique instances. I believe that partitioning by HUB_WorksOrder_Key, StatusID and StatusHistoryID this will allow me to identify each instance as this will be the unique instance of these 3 values, as well as track historical movements of attributes within these 3 key values.

Basically, I want to partition by the dependent child keys and I’m asking whether this is acceptable or whether there is another approach?

Reading between the lines, with Patrick Cuba arguing that the Dependent Child should not be included in the LINK_HASH_Key then I am inferring that within an associated LINK_SAT you must have to partition on the LINK_HASH_KEY and the Dependent child in order to get the correct information into the InfoMart.

Actually, I’ve managed to find something in Patrick’s book which reads as:

“When considering a dependent child key, the dependent child key becomes a part of the partition key to select the temporal data you want, by extension this applies to intra-day key; a combination of these keys.” (Page 441, The Data Vault Guru a pragmatic guide on building a data vault, Patrick Cuba)

I suppose I’m looking for additional confirmation that my understanding of dependent child keys and how they can be used is correct?

Thanks,

Carl

Hi Carl!
If you choose to include the dependent child in the link then the I see no other option than including it in the hashkey for the link.

Createing a link on HUB_WorksOrder_Key, StatusID and StatusHistoryID will give you what you need. Connect a rts on that link and you can track history. Is this an answar to your question?

Hello,

This is one of the reasons why I choose to use the dependent-child key in the satellite table instead of the link table.

  • Less complicated; dv2.0 does not provide any guidance on whether the dep-key is included in the link-hash key calculation or not (probably intentionally). If it is included then every satellite off that link must be against the same grain. A Link table does not provide the functionality to track a relationship when it returns to its previous state. If the dep-key is not included in the link-hash key calculation then the dep-key value is true for the life of the relationship.
  • I get the grain I want in terms of dep-key usage and changes to the link+depkey. It is also far easier to track and manage vs the multi-active satellite table.
  • I can add more link-satellite tables to the same link that are not about the dependent-child key; by extension, status, tracking, record tracking
  • I can add mixed grain satellite tables off the same link table, a few sats are regular, some include their own dep-keys some can even be multi-active.
  • I reduce the need to join more tables to get to the same result
  • I can also choose to treat an intra-day timestamp as a dep-key and therefore easily manage data loads that includes multiple changes for the day

Just so many reasons why I think dep-keys in link tables are a bad idea!

Hi Andreas,

The main reasons for wanting to include the dependent child in the Link are 2-fold. The ScheduleOfRates (like Product) link to the WorksOrderLine and not the WorksOrder so the granularity of the relationship I require is at the WorksOrderLine level. Secondly, from my understanding, LINK tables are used as a basis for FACT tables in the InfoMart layer and I need my FACT table to be at the WorksOrderLine granularity.

If I create a LINK on HUB_WorksOrder_Key, StatusID and StatusHistoryID then won’t this be considered a peg legged LINK as this LINK table will only contain one HUB_KEY reference? Which should be avoided?

I potentially need to be able to track attribute changes of the HUB_WorksOrder_Key, StatusID and StatusHistoryID partition so I presume, if i create a LINK as you suggest, I would need to create a LINK_SAT and use the LINK_HASH_KEY, created from HUB_WorksOrder_Key, StatusID and StatusHistoryID, as the partition key in the LINK_SAT?

Am I right in thinking that your preference would be to avoid using Dependent Child Keys in Satellites and put them into LINKs where possible?

Thanks,

Hi Patrick,

Thank you for taking the time to reply and thank you for providing the reasoning as to why you prefer to use the dependent-child key in the satellite table instead of the link table. These seem to me be valid reasons for the choice.

Are you suggesting that I should remove the dependent child-key entirely from the LINK into the LINK_SAT?

As I have just commented in my reply to Andreas, the main reasons for wanting to include the dependent child in the Link are 2-fold. The ScheduleOfRates (like Product) link to the WorksOrderLine and not the WorksOrder so the granularity of the relationship I require is at the WorksOrderLine level. Secondly, from my understanding, LINK tables are used as a basis for FACT tables in the InfoMart layer and I need my FACT table to be at the WorksOrderLine granularity.

If we take my second point in isolation for a moment, if i was to move the dependent child key from the LINK and into a SAT then the basis of my FACT table in the InfoMart would be the SAT, rather than the LINK, and I would join to the LINK table only to return the required surrogate keys for the relationships in my Star Schema. The granularity of my FACT table would be coming from the SAT rather than the LINK. Is this an acceptable position?

With regard to my first point, the ScheduleOfRates (like Product) HUB has a relationship to the WorksOrderLine and not directly to the WorksOrder (although as the parent of the Line it does have a weak relationship) so the granularity of the relationship I require is at the WorksOrderLine level.

Can I remove the WorksOrderLine from the LINK and still maintain the weak relationship of WorksOrder to ScheduleOfRates?

If so, then I would be able to move the WorksOrderLine to the LINK_SAT as a dependent-child key and take advantage of the benefits you listed above (if it’s OK to base FACT tables predominantly off a SAT as described in my earlier point). However, in order to determine which WorksOrderLine (dependent-child key) relates to which LINK_HASH_KEY in the population of the LINK_SAT I would now need to join on WorksOrder and ScheduleOfRates, from the WorksOrderLine, which would not guarantee me the uniqueness I achieve with WorksOrderLineID. So, I think i’m back to keeping the WorksOrderLineID in the LINK in order to ensure uniqueness (a WorksOrder could have multiple WorksOrderLines with the same ScheduleOfRates).

My initial question was whether I can partition on dependent-child keys, and, thanks to Patrick’s book ( The Data Vault Guru a pragmatic guide on building a data vault, Patrick Cuba pages 234-244) I have been able to determine that the answer is yes. Thank you Patrick, I’m finding the book extremely useful and informative and would definitely recommend it to others.

With regard to my particular requirement regarding the WorksOrderLine and the LINK I think i’m still a little turned around…

Not suggesting — recommending — link-sats = fact table, see https://www.linkedin.com/pulse/data-vault-snowflake-expanding-dimensional-models-patrick-cuba

The use of dep-keys in sats was taught to me by a DV2.0 authorised trainer; it simplifies everything. The same dep-key can be used in multiple satellite tables to give you that grain of detail you need.

Hi Carl!
I really enjoy your questions and I think this topic is very important to debate. My opinion differs a lot (you might already noticed) from the data vault standards. I have faced the challenges that you have challenged and I have chosen to deviate from the dv2 standards to solve them.
To your questions:
A Link with only one Hub and other keys might not be called a peg link, but that is the object that is needed to be able historize the relations between the hub and the other keys. You use the rts or eff sat on the “peg”- link to track what relations that are valid at a specific point in time. Please remember that you can still hang data on hub level on the hub if you need.

The link hashkey must return different hashes for different statusid and historyid for the same hub so the statusid and historyid must be included in the hash calculation.

I think the multi active satellite causes unnessary headache in a lot of implementations. I just dont try to avoid them, i have stoped using them and now solving multi activity challenges with puting the dependent childs into “peg”- links.

Hi Patrick,

Apologies in my delay getting back to you. Thank you for the link and the recommendation.

I have read that article before and found it useful. However, my focus when reading that article was learning more about PIT and Bridge tables so I must have missed the part explaining that the LINK SAT is the basis for the FACT. Understanding this better helps me understand the argument for the dependent child key(s) to exist in LINK SATs.

I was able to see how your recommendation for moving the Dependent Child Key into the LINK SAT, and Andreas’ suggestion to create a non-historised LINK table with the dependent child key in the LINK HASH KEY, could both work for my requirement. It was then a case of working out which provided the most benefits with regard to flexibility, scalability and my particular requirements.

I decided to implement the approach of having the Dependent-Child Key in the LINK SAT, as per your recommendation. However, for my particular requirements I made a slight change to my thinking which resulted in the creation of 2 LINK tables:

  1. LINK_WorksOrder - this contains the relationships between HUB_WorksOrder and related concepts and remains at the WorksOrder grain. I have requirements to be able to be able to navigate from other transactional type data (e.g. customer feedback) back to WorksOrder to retrieve the related concepts (e.g. Contractor) and didn’t want to have to execute a SELECT DISTINCT in order to bring my results back to the WorksOrder grain, if using LINK_WorksOrderLines.

  2. LINK_WorksOrderLines - this contains HUB_WorksOrder, HUB_Contract and HUB_ScheduleOfRates which are all at the WorksOrderLine grain. I then have a LINK SAT as a child of LINK_WorksOrderLines which uses WorksOrderLineID as the dependent-child key.

I can then retrieve all the information I need by creating a Bridge table which combines (via HUB_WorksOrder) LINK_WorksOrders and LINK_WorkOrderLines (and the related SATs). This Bridge table is the basis of my FACT table.

Gaining a deeper understanding of what Dependent-Child Keys are and how they can be used was integral to my decision making. I have spent plenty of time trying to wrap my head around this concept and utilised your book and articles extensively so thank you for the help :slight_smile:

Hi Andreas,

Apologies also for my delay in getting back to you.

For this requirement also I have decided to put the Dependent-Child keys in the HUB SAT. The Status Date will provide me with the effectivity of each Status and the dependent child-keys allow me to track each unique occurrence of the status within the WorksOrder. If i need current status then I can return the row with the MAX Status Date and if i need point-in-time I can use the Status Date within date filter logic.

For my own understanding and clarity, and please correct me if I’m wrong, I’m under the impression that a SAT with a Dependent-Child Key is not the same as a multi-active satellite as the Multi-active SAT loads the full SET of history rows for a HUB_KEY if anything changes within the attributes of the SET or if the number of rows within the SET changes whereas the SAT with a Dependent-Child Key doesn’t load the full SET but only loads the changes?

Which one to use and when though is currently beyond me… I need to learn more about the benefits of each and where best to use each pattern.

With regard to the peg-leg links, this is something else I need to learn more about. I read somewhere that these should be avoided but, again, my understanding as to why they should be avoided is limited. I can picture in my head how a peg-leg LINK would work for my requirement and you are providing anecdotal evidence that this approach works for you. Instinctively, it seems to make sense to try and create non-historised LINK tables for transaction type data but I suppose I’m coming back to Patrick’s earlier arguments for putting the Dependent-Child key in the SAT as opposed to the LINK. As previously mentioned, I’m new to Data Vault so my experience is limited so I think I’ll give the Dependent-Child Key in the SAT approach a go and see if something comes back at a later date to bite me in the butt.

Thank you again for taking the time to reply and provide your opinions on which approaches work, it has definitely helped shape some of my understanding. :slight_smile:

Happy to help!
Regarding effectivity for the satellite… (be prepared for another deviation from the dv2 standards). I would never trust a data source to provide my dv with effectivity. My suggestion is always to store everything and track what has been delivered with an rts. When you discover that the source has delivered something inaccurate, you can show the the full history of their deliveries.