(on recommendation from @patrickcuba , I am posting this question )
What is the main reason we don’t put the Business Keys (not the hashed values) in the SAT? I understand why we don’t put the FK in the SAT. But I am curious why Business Keys are not in the SAT.
As I understand it, the design choice was made to support pragmatic searching.
If your human-readable key is in a satellite table, you need to join the satellite table in order to satisfy the query ‘select * from Entity_Business_View where business_key = ‘<human_readable_key>’’ .
That’s not too much of an issue if you only have one satellite. But if you have 5 or 6 satellite tables (from different sources), then you would have to join all of the tables, and search all of the tables to get your results. That means maintaining an additional unique index (of human_key/load_date) on each satellite table.
Compare this to searching only the hub, then joining only the satellites that have information on the record. You are only maintaining one unique index, and then joining out on PK indexes. Less storage, less index search time, faster queries.
As another angle, if your query only concerns the human_key (such as ‘count number of links on this key’, or ‘give me information about this other linked entity based on this key’ ), you don’t even need to join the satellite if the human_key is in the hub.
Hi @will.list ,
I guess my question was if we already have the Hashed BK and the raw BKs in the HUB, is it against the standards to put the raw BKs in the SAT? If so, what is the design reasoning behind that?
Is it against the standards? I don’t think the standards specifically mention it, but they do heavily imply the question “why would you?”.
Again, this comes back to use case. You can pretty much put anything in a satellite, and the raw BK is just another piece of data. If you have a use case that supports retrieving this piece of data without ever hitting the hub, and also supports the additional storage, go for it. However, that seems a very rare edge case.
got it. Thanks @will.list . I just wanted to confirm that by placing the raw BK in SAT (in addition to the HUB), we are not breaking the methodology in a way that will cause problems down the road.
Why not in both?
If my source for a Dim is a single Sat then why do I need to JOIN for the sake of the lack of BKey?
There is also a case for keeping the untreated business key in the satellite and the treated key in the hub, that way you can derive even more analytics as to why a source might be producing mixed case keys and another source is not. Poorly designed keys that consider an upper case A in a bkey to represent a different business object to the same bkey but with. lowercase A should be called out.
Pretend you’re a customer and the call centre asks you for an identifying number do you then respond with "uppercase A, 5 , 7, 1, 2, lowercase B… ". Although not recommended to use as a business key, Salesforce IDs had a similar issue, they then converted their IDs to CaseSafe IDs.
Another hypothetical… if you have chosen to not use hash keys but only use natural keys in your vault have you not now loaded natural keys into your satellite tables?
good points @patrickcuba . Thanks for the details. We actually had that is with the 15 character salesforce IDs.
Hi Patrick, you upper/lower bk real world issue, made me wonder following situations:
- dv etl uppers masking ensuring just 1 hub entry.
- dv etl doesnt mask upper, meaning a new hub entity inserted.
Number 2 seems in line with dv “load as-is” but i wonder which option you recommend?
I know that as always decisions are based on context, like how often some type of event happens to be called bad record for downstream human to fix source vs having etl massage raw data like upper(bk).
Probably a 3rd solution could be:
3. dv etl uppers bk when checking hub for existence, and then:
Seems 3 the best one right ?
Passive Integration is a DV2.0 standard, we call them Business Key Treatments, business keys are pre-treated before loading them to a hub table, it ensures that all you need is an EQUI-join to join related hubs with their links and satellites across source systems and not have to model some crappy workaround like a another link table to solve an integration debt option 2 would have introduced. Business Key Treatments. Business keys are the crux of a data… | by Patrick Cuba | Medium
Thanks Patrick for confirming/remember us the standard