Hey folks! I’ve been pondering about this for some days, and i was wondering about this:
When building the information marts, as Kimball-style marts (dimensions and fact tables) - Do you use the hash keys as your keys to join between tables, or do you use business keys? And why?
I’ve created a lot of tables in the information marts already using hash keys, and I recently read some guidance from Patrick to not expose them to users, so i was wondering what is the alternative, just using business keys?
Thanks a lot in advance to anyone who can help with this!
@sicarul If you have a DV2 raw vault model, you may compute a hash key to use in the join. You do not expose hash keys to users - Patrick is correct. We never expose hash keys to consumers or end users. The recommended best practice is to use a PIT, a Bridge, or a PIT/Bridge Hybrid to hold the processing results that are then exposed to the users through virtual (views) facts and dimensions. Your view will expose the underlying business keys from the PIT or Bridge or PIT/Bridge Hybrid along with whatever descriptive data may be required (think dimension) or whatever descriptive data makes sense to the user of the Fact view. If you are building virtual facts and dimensions, you can also use a sequence key that can be exposed to the end user since that is what they are most familiar with in a star schema.
@sicarul Dimensional modeling generally all would encourage to put SK’s, however we designed to go with the business keys first time in Dims and Facts. But we are in development state currently.
Think of hashkeys as tools that make sense depending on the scenario/technology being used.
You could be building a DV without any hash-keys at all! In the Info-Marts why would you need hash keys there if it has no purpose to the business? Hash keys are surrogate keys, like INT keys in dimensional modelling they serve a specific purpose.
I did a webinar discussing hash-keys, natural keys and int key joins using PITs, SNOPITs and the like on Snowflake you can find here.
But in hindsight, you are not really intending to expose the hash keys (pk) with users right? you will just use it for joining as well?
I have the same question recently and seems like it’s okay to use the hash keys in dimensions and facts. In a way, it acts as the surrogate keys as well.
We can hide the keys on the reporting layer.
But I am new in DV and this is just my recent realization upon tinkering on this topic for a while now…