We currently have a situation where 3 different sources provide the same business key (and value) with 3 different source field names. The “master key” in this case is called EmployeeId. The other two sources come from Adobe and are generically called evar_40 and evar_79.
We will store the business key as EmployeeId in the employee hub table. However, for the distinct satellite tables split out by source, should we name the business key hash using the source field name? For example, would the satellite contain “evar_v40_hash” or should it keep the “EmployeeIdHash” as its hash key field name?
In this case there is technically no need for a same-as-link, since we are storing the same value from the different sources. The difference is the metadata for the value, the nomenclature.
In order to keep auditability, my assumption is that we need to keep the field names as they are from the source for populating the raw vault. If we change the field name, isn’t it technically then part of the business vault?
Please help me understand this distinction.
Same but different Business Objects, or same but the same business objects?
@patrickcuba - Thanks for the reply and the article. I appreciate the prompt response.
If I understand your question correctly, this situation is a “same but the same business objects.” We won’t need a business collision key here. The satellite hash keys for each different source will keep the EmployeeHash name rather than using the name from the source, since the metadata mapping “tracks” the source-to-target relationship (name change).
Only use BKCC if it is needed, else nothing in the DV integrates and you end up with a Source-System-Vault; Anti-Pattern and not the objective of a Data Platform.
The definition and identification of a BK is based on the content and semantic meaning. It should be the best logical/business name for the column. I have regularly had multiple sources where the source column is named different but the meaning is the same. The hash key of the hub EmployeeId should be carried to the Sat. Your source target mapping and ELT code/lineage provide the auditability. There is no requirement to keep the source column name for the BK; however, it is a best practice to keep the source column names for the descriptive attributes in the Sat.
I’ll put my thoughts into the hat. Yes the hub has a bk column named employee_id. The other feeds would map their pk columns into that column - so effectively an alias name change to employee_id. As for the satellites - we work to a standard where the hk in the satellite is always named the same as the parent hub’s or link’s hk. It is a hash value anyway so not the same as the bk column. If you want to preserve information about the original key column name it could always be added as a regular payload attribute in the satellite. This only works where the feeds do not overlap at all, and are effectively all part of the same pk numbering scheme. If they could overlap and are from mixed sources then you’d also use a collision key column.
For me it’s pretty simple.
You have an object called employee_hub, or hub_employee whatever…
That object has two key columns:
employee_bk (yes this should be its actual name)
If the true objective of DV is integration and business centric then it stands to reason that the bk and hk attributes in the hub should have generic names associated with the business concept and not in any way related to any of the sources or someone’s feelings about what it should be called.
From there it does not matter what the individual sources are called they are just all mapped on to the employee_bk attribute because they are the employee business key for that source.