Your diagram indicates a dependency exists before the link table is created. However, there are instances in the chapter where this dependency is not specifically mentioned. Instead, it is implied that the link table can be created even if there is no table to be linked to it other than the table currently being loaded. I realize that a link table will be needed in the future but does creating a link table with only 1 table make sence?
When you’re saying “your”, you’re talking to @patrickcuba right?
This diagram is showing the DV1.0 loading pattern, where sequence keys were being used instead of hash keys. So you had to load your Hubs first, in which sequence keys were generated that you would then lookup in your Link load. So the order your load would be in was:
Hub → Link+HubSat → LinkSat (as mentioned in the figure description).
Hence the ‘staggered’ load.
If you don’t apply the correct mappings to your Hubs, you can end up with a Link that you can’t join back to the Hub to retrieve the Business Key. But if you do, it doesn’t matter what order the Hubs and Link get loaded in. At the end of the loading process, all tables will be populated correctly.
Thank you for the explanation of the differences between Figures 7-2 and 7-3. Unfortunately, I’m still confused about how to handle the loading of a link (or bypassing the loading) when you ONLY have 1 of the tables available. My question was → “Instead, it is implied that the link table can be created even if there is no table to be linked to it other than the table currently being loaded. I realize that a link table will be needed in the future but does creating a link table with only 1 table make sense?”
Thanks
I think I don’t fully understand the question… what do you mean with “you only have 1 of the tables available”. Do you mean only 1 hub? You need two Hubs for creating a Link (not taking into account Same-As Links or Hierarchical Links but I don’t think you’re referring to those right now). Otherwise, what are you modeling in your Link? A relationship between your one Hub, and… what?
Can you maybe show some (dummy) source data that you’re trying to model?
What is confusing to me is in parallel processing, links can be loaded at the same time because there is no dependency. I’m interpreting that to mean that a link can be created with only 1 known hub and the second hub being a Zero Key or Missing Key. Is this a common practice or do you wait until you have at least 2 hubs that you want to link? If it’s the latter then the parallel processing WILL Not be parallel in every situation. Yes??
Let’s say we have a source with:
party_id, address_id
clay, your street 123
frenk, my lane 456
Which you will load into the following tables:
hub_party:
party_hkey, party_id
hash(clay), clay
hash(frenk), frenk
hub_address:
address_hkey, address_id
hash(your street 123), your street 123
hash(my lane 456), my lane 456
link_party_address:
link_party_address_hkey, party_hkey, address_hkey
hash(clay||your street 123), hash(clay), hash(your street 123)
hash(frenk||my lane 456), hash(frenk), hash(my lane 456)
You can load all three tables at the same time because there is no loading dependency between them. The only dependency in this scenario is that you of course need to have the data in staging to be able to load the link. If you don’t have the address_id, creating the link doesn’t make sense.
DV1 pattern relied on parent tables being populated first before a child table could be populated by taking that surrogate sequence key and reusing it in that child table — that sequence key was then used to join: very efficiently.
But along came DV2 where we use hash keys instead, because surrogate hash keys are deterministic you can load all related tables at the same time — they may still be referred to as parent-child relation to each other but their loads no longer need to imply it.
I have a question about the one step before loading; when we want to create a table, then we need to define the F.K columns for the Link in DDL. Will we have any problem regarding the loading because of the F.K constraint since by loading, we will load the hash keys from Hubs into the Link in parallel?