Loading of Hub's and Links

Bigguy365 · 13 June 2022 19:52

We have a reasonably disciplined shop where consistency exists within various applications when referencing an employee. For example, we have four applications that all use the same key value that identifies the employee. So, the hub load compares the hashkeys, and the record is not loaded when they’re the same. Perfect. Now, I understand that the BKCC code differentiates the same hashkey values for different rows. In the pipeline to load the hub, what is the orchestration flow if a duplicate occurs when the business key value is the same but represents a different employee? In other words, what is the logic used to identify this situation?

patrickcuba · 13 June 2022 20:41

Why do you need to “identify” it in the loader?

Bkcc is used to distinguish the same bkey value loading to the same hub but are different business objects, you will see multiple rows in the hub

You say your apps have the same bk value used across apps that are the same business objects then they don’t need bkcc’s or at least share the same bkcc. This is excellent and serves Passive Integration well. You will see a single row for the same bk value

As you know bkcc is used in conjunction with the bk to create the hk.

You will not get duplicate keys in the hub.

patrickcuba · 13 June 2022 20:43

Bigguy365 · 13 June 2022 22:26

The image below is from Dan’s book

The spreadsheet below contains the example of the loads for the employee hub.

The processing for John Doe resulted in just 1record being added to the hub. However, the activity “Retrieve Distinct List of Business Keys from Stage” will produce the same HashKey based on the calculation even though its a different employee. Question – what is the logic used to identify Frank Smith as a different employee even though his employee number and Hashkey is identical to John Doe. The BKCC should contain a value to reflect that it’s not the same resulting in a different HashKey thus resulting in him being loaded into the hub.

Thanks

patrickcuba · 13 June 2022 23:36

Of course!

Bkcc makes up the hashkey with the bk

If they are different business objects with the same bk then you must use a different bkcc

What am I missing?

Bigguy365 · 14 June 2022 00:57

I guess it’s what you’re referring to as a “business object”. In my example, my interpretation is I have 5 different applications (4 that are the same employee and 1 that’s different) business objects. Where am I wrong?

Thanks

patrickcuba · 14 June 2022 05:29

The different bk should not be ignored
Give that the non-default bkcc

This allows for equi-joins between a hub and sats and links and you will only return the relevant results for that business object

Bigguy365 · 14 June 2022 12:37

Hey Patrick

The good news is that we’re both on the same page here. We both agree that the correct approach is to replace the default for the bkcc with something that would result in a different hashkey for Frank Smith. Where we’re having a disconnect is the logic in the automated processing to distinguish the different employees. From my previous example, you can determine by referencing the employee’s name that it’s not the same as the current 12345 hashkey in the hub. But that only fixes this one scenario. If at all possible, how would you code this to have the bkcc determined at runtime? Perhaps a soft rule for those duplicate hashkey values to determine if in fact it is a duplicate before ignoring the record. If not a duplicate, then add a value to the bkcc and recalculate the hashkey eventually resulting in the Frank Smith record being added to the hub.
Clay

patrickcuba · 14 June 2022 14:52

Bkcc is not determined at runtime
As I have said, it is included in the generation of the hashkey and therefore is impossible to join to the wrong record in the link or sat

Bigguy365 · 14 June 2022 15:54

Thank you, Patrick.

Topic		Replies	Views
How to implement hub haskey concept for same financial instrument from multiple source systems. Data Vault 2.0 business-key	1	81	1 August 2024
Business Key Field Names from Different Sources with the Same Values - Is it still passive integration? Data Vault 2.0 business-key	6	1263	3 February 2022
Modeling alternate ids in hub - BKCC? Data Vault 2.0 business-key , hub	3	1139	30 June 2022
Modeling of link: reference a hub by businessKey instead of hashKey Data Vault 2.0 link	6	103	30 June 2025
Must I re-engineer the DV when I add a new source with a possible collision to a hub table? Data Vault 2.0 business-key , integration	1	702	7 February 2024

Loading of Hub's and Links

Related topics