Link tables with technical key

Hi,

we are facing the problem that our source system is submitting a technical id and a human readable id which we are using as the BK. Now the relations between different Hubs coming from our source systems are only provided by the technical IDs. From my understanding, we should provide in the link table between these two hubs the HKs from the BKs, not from the technical id. The technical ids will only be stored as attributes within the satellites. But where do we “lookup” the BK from the technical IDs? Is it allowed to create something like a mapping table in the staging layer to lookup the BK before loading the data to the vault? Or should we use the provided technical ids to create the hashes in the link table, populate them to the hub as well and create then a SAL table to map between technical and BK? The technical IDs in the source system are unique as well as the BKs.

Example:


Customer:
BK, techID,Name,CreatedByUser
company-123, ab78d, Company 123, ud89d

User:
BK, techID,Name
User-123, ud89d, User 123

Thanks for your support

My advice might differ from what the general dv crowd, please be aware of that.

Dont join or fix(trim is ok) anything before loading datasets into the raw vault, you perform soft business rules and that leads to data loss.

In your case i would model that technical id as a hub and the human readable id as a hub. The naming of the the hubs should be very similar. Create a link between them.

Hope this helps.

This is a classical problem with SAP data for example: SAP shows an external customer id (business partner id) which is used to exchange data with other systems and is the perfect BK candidate but uses an internal technical ID for its own tables.
We’ve always modelled that using a SAL on one single hub containing both keys, which works well. Beware of collisions risks though (you should obviously not have a technical id and and a human readable id with the same value and a different meaning otherwise they will be mixed up in the hub).

I see your point and i know that your suggestion is the common “best practice”.
My point is that if you mix different business key structures(data types and length) in a hub you might face these issues:

  1. Forced to use bkcc with the following side effects:
    A. The business key must also be stored in a satellite.
    B. Logic for handling bkcc when searching the hub.
    C. Might lead to performamce issues when handling bkcc in selects.
  2. Hard business rules cannot be performed for some values in the hub.

Operational reality dictates that some of these things are not available.

Option 1: ask the source to translate before sending it to you — best case, surrogate key will simply be an attribute in the satellite table
Option 2: Load the human readable and technical id to the same hub — they are the same things, one is the surrogate key — use a same-as link to translate for reporting