Should Business Key be included in the SAT HASHDIFF? Also what about Depedent Child in the HASHDIFF for LINK SAT

saqib · 1 February 2023 18:01

Hi,

I have seen Business Keys and Dependent Child included in the HASHDIFF of a SAT or LINK SAT. What is the best practice? Does it make a difference?

Thanks,
Saqib

AHenning · 2 February 2023 08:03

It matters. Having the bk inside the hashdiff enables you to only compare hashdiffs against hashdiffs, as long as you compare the latest content in the satellite. It is very beautiful code.

Nicruzer · 3 February 2023 13:05

@saqib - Dan Linstedt actually addresses this directly in an article he published on http://LearnDataVault.com called DV2.0 and Hash Keys (which is included in the practitioner course book as a tab labeled Hash Keys):

** IMPORTANT! Do not forget to ADD the sequence OR the natural primary key field to the HASHDIFF calculation!! This will ensure the most uniqueness of the Hash calculations across multiple similar Satellite rows and greatly reduces the risk of duplicate hashes for different values.

saqib · 3 February 2023 14:11

Hi @Nicruzer . Interesting. I wonder if this is still applicable now that we use SHA-256 with a extremely low probability of collision.

AHenning · 5 February 2023 11:39

My point is: you can load your satellites faster if you include the bk in the hashdiff. Thats because you only have to check for new hashdiffs compared to latest hashdiff for each bk.

saqib · 5 February 2023 13:43

@AHenning . how will include the bk in the hashdiff help that? you will still have to check for new hashdiffs compared to latest hashdiff for each bk.

AHenning · 5 February 2023 13:54

Not for each BK, i will explain.
If you extract a list of a hashdiffs for the latest loaddate for each bk from a satellite. That is just a list of hashdiffs.
Then compare the incoming hashdiff-rows with this list. All hashdiffs in the incoming bucket that doesnt match any hashdiff in the list should be inserted.
This works only because the bk is a part of the hashdiff. Do you follow, or should I go into more detail? Kind regards.

patrickcuba · 5 February 2023 22:07

Hey,

Bkeys can be included in the HashDiff, it makes no difference. Those that suggest it makes it more unique probably have never seen the load code for a satellite that compares the latest hashdiff per hashkey.

Dep-Keys must be included in the hashdiff, they after all simply just attribute columns identified as dep-keys, nothing makes them special to be excluded from hashdiff calculation.

Hope that helps

AHenning · 6 February 2023 13:33

There is a difference between including the BK in the hashdiff and not including it in the hashdiff. The following example says why:

incomming data batch:

bk,country,comment

AA,Cuba,is wrong

BB,Cuba,is wrong

Look at the example above

If we calculate hashdiff for country and comment we got the same hashdiff, for simplicity lets say that the
concat and hashcalculation of “Cuba” and “is wrong” equals 0xCC. This means that the two rows will have the same hashdiff value.

But if we calculate the hashdiff for bk, country and comment we got two different values because we have different bk:s in our example.

Lets call these two 0xAA and 0xBB.

After that we define a second incoming data batch like this:

bk,country,comment

AA,Cuba,is wrong

BB,Cuba,is still wrong

Lets calculate the hashdiffs for incoming data and include the bk.

AA,Cuba,is wrong → 0xAA

BB,Cuba,is still wrong → 0xBD

algo:

Extract current hashdiffs for each BK → 0xAA, 0xBB
Calculate hashdiffs for incoming data → 0xAA, 0xBD
Insert rows where hashdiff in statement 2 is not in statement 1 → 0xBD.

Conclusion:

The hashdiffs gets “more unique” if you include the bk.
The loading process of a satellite can be performed faster because you just have to do an anti-join between incoming data hashdiff and latest hashdiff per bk
instead of matching BK and BK and then compare the hashdiffs. This can be performed because the BK is included in the hashdiff.

patrickcuba · 6 February 2023 20:59

errr right… like i said those that suggest there is a difference have never seen the load code.

AHenning · 7 February 2023 06:26

This might be an eye opener to you but you can load a satellite with different program code.

patrickcuba · 15 February 2023 10:34

Yea clearly! The right way and the wrong way, well done son.

AHenning · 15 February 2023 13:20

Well done being narrow minded. There are still many correct and incorrect ways to write code.

patrickcuba · 15 February 2023 21:24

ok sunshine… lol…

AHenning · 16 February 2023 05:15

It seems to me that you still dont understand the algo I posted. Maybe you should look into it more, maybe you can learn something?

patrickcuba · 16 February 2023 20:42

lol … sure sunshine…

AHenning · 17 February 2023 07:44

Maybe you should address the topic in this thread instead of being unpolite. Do you still think that there is no difference between including and not including the bk in the hashdiff?

patrickcuba · 20 February 2023 23:54

What is surprising is that you think I haven’t seen incorrect implementations like yours before.

So before you share more “expert” opinions in these forums I recommend getting yourself a copy of this: Amazon.com

AHenning · 21 February 2023 07:07

Well, well well. It is very interesting that you still argue that the algo I posted is incorrect. Maybe you can explain why it is incorrect? The algo I posted is nothing that I came up with myself. I was thougth that algo by Michael Olschimke when I did my dv2 certification in 2019.

Topic		Replies	Views
Sat hash Diff Calculation Erwin hashdiff , satellite	1	445	10 April 2024
Why the Business Key should not reside in the SAT? Data Vault 2.0 business-key	9	514	7 April 2022
Composite key , more than 1 business keys Data Vault 2.0 link , hub	5	122	10 November 2024
Query optimization for the Link SAT with Dependent Child Data Vault 2.0 dependent-child , link	12	853	7 June 2022
Business Entity with multiple Business Keys Data Vault 2.0 business-key	1	65	19 March 2025

Should Business Key be included in the SAT HASHDIFF? Also what about Depedent Child in the HASHDIFF for LINK SAT

Related topics