Hash key generation - sequence of columns

For Hub table hash key generation,
we have BKCC (business key collision code) + Employee_Id

For the Satellite Table defined as,
name + phone_no + gender + email_id

If I take the columns of the satellite table which will take part in hash diff generation and arrange them in alphabetical order, and run the hash algo, is there a risk in following this approach?

If the take the columns of the hub table which will take part in hash key generation and arrange them in alphabatical order and then run the hash algo, is there a risk for this approach?

I am aware that a metadata table can be setup to mention the business key columns for hub
And satellite table columns for generation of hash diff.

Want to explore this idea whether it makes sense to follow the approach I am suggestion or are there hidden risks that I am unable to see?

@Nikunj -

Clearly, you’re aware that order matters when hashing any concatenated set of values. The key is that your choice of ordering is consistent and not so much which type of ordering you choose.

In fact, dbtvault uses alphabetical order as the sorting of choice for hashkey and hashdiff generation:

Hashing is sensitive to column ordering. If you provide the is_hashdiff: true flag to your column specification in the stage macro, dbtvault will automatically sort the provided columns alphabetically. Columns will be sorted by their alias.

Whatever you choose, simply ensure that the approach is consistent from that point onward.

If I may also add (and I know you didn’t ask), the other elements of hash generation are also critical to consider. Specifically:

  1. Choose a strong delimiter when concatenating the values
  2. Choose a specific, unique value to represent NULLs from your dataset (example, ‘^^’)
  3. Character set and encoding
  4. Case sensitivity
  5. Stripping leading and trailing white space
  6. Casting ALL data types to VARCHAR (string)
2 Likes

Concise me lord, concise

1 Like

Thanks @Nicruzer and yes, other elements of hash generation are considered. :slight_smile:

1 Like