Business Key treatments: what is allowed and what isn't?

I have several sources that all deliver references to a local client. There’s even one source that is now considered the golden source wrt client data, in which they’ve brought all client data from all local systems together, and tried to link those to the ‘global’ client. Clients can be identified with the composite business key “local_id” + “local_schema”. They even applied some BK treatments already in the sense that they do left padding zeros so that every local_id is a 12-digit number. I want to perform the same treatment on the fields from other sources.

The only thing is, let’s say I now have these two sources:

local_id | local_schema
000000000123 | ABC

local_id | local_schema
000000000123 | Alpha Beta Charlie

Alpha Beta Charlie means exactly the same as ABC in the golden source. Is it okay to perform a treatment (which is going to end up a CASE statement or maybe a lookup to an external view)? Or would I have to load it as-is and create a same-as link in the BDV?

Both the CASE-ststement and the lookup are two ways to perform soft business rules.
If you perform a lookup you also create a dependency to another source, this means that the process for loading the dataset stops being autonomous.
I would suggest that you dont do any of them.

Hi Frenk,
This looks like a potential use case for a Same As Link to me - you can then load each occurrence and map to your golden business key.



Same-as link and the link should be provided by the source, do not manufacture this relationship yourself or you will end up managing this technical debt

Thanks for everyone’s input! I will go with the same-as link.

1 Like

But what now when I get the following source:

So here, it is implied by the name of the field that the local_id belongs to the local_schema ABC. What would my mapping be in this case?

i don’t understand… that would map to your hub table, no?

Yes it would, but my Hub contains a composite business key (local_id + local_schema_id). This new source only provides the local_id, the local_schema_id is implied by the name of the source column.

1 Like

Hello Frenk!
My suggestion(might not be data vault standard) is to never put composite business keys in a hub. It makes implementation and automation a lot easier. If you have to combine two or more keys, do it in a link. I also revommend using multiple hubs instead of bkcc.
The models wont be as pretty but it is so much easier to handle when you extract data from the RV.

But what if I get two local_ids with value 123? They don’t mean the same and they can’t live on their own, they only mean something icw the local_schema. So I would get two hubs that can’t/won’t have any satellites on them, only on the link I can create a satellite which holds their descriptive attributes… can’t say I’m too keen on that solution.

If local_id from different sources means diffeent things, i suggest you put them in different hubs.

It is more important that the data can be regenerated (zero data loss) then to create a solution that is keen.