DeNormalizing/Flattening Raw vault entities to create Dimensions for the Information Mart

I want to ask what is the best way to create Information Mart dimensions when more hubs (and Satellites) are participating in creating a DeNormalized Dimension.

For e.g., Let’s say there is a Product Hub that has an indirect route to Category Hub via the Product_Category link.

Both Product and Category Hub have their own attributes as well as separate loading dates residing in their respective satellites. The final Product dimension can only be made by joining both these hubs and their respective Satellites.

What is the best way to do that, considering the dimension will have most of the attributes acting as historical (SCD2) attributes?

What looks more challenging to me is to join the two satellites because they both have separate loading dates.
Note:
This scenario gets more complex when the Category hub is further associated with a Brand hub via some Category_Brand link.

Thank you.

It sounds like this is a case for pit & bridge tables. Your bridge table can be used to create ‘links by usage’ and improve performance of building a downstream table such as a dim. Pit tables are used to get point in time relationships across multiple satellite tables. Patrick has a good article on pit tables here: Data Vault on Snowflake: Point-in-Time (PIT) constructs & Join-Trees | by Patrick Cuba | Snowflake | Medium

1 Like

Thanks for the reply. I agree that this problem is similar to the one where a hub has multiple satellites, and PIT comes to the rescue. However, in my case, there are single satellites with their respective hubs joined together in making a dimension. How would PIT solve this issue?

In simple words, PIT helps to join multiple satellites for the same hub. How can we join different satellites of different hubs? I am particularly interested in the query part.

Thanks,
Irfan

Hi Irfan,
The PIT structures would also need to be combined with a BRIDGE table that can be used to denormalize the HUBS and LINKS together into a single table expressing the grain of your target dimension. The approach is iterative SQL development, using the best materialization available to persist if necessary. Start with PITS for your HUBS and LINKS to align a time-line and the assemble (join) based on the story you need to tell in the product dim until you have a virtual representation of the DIM, persisting as required.
Hope that helps,
Bruce

3 Likes