Need help on Building and Maintaining PiTs

It’s the PiTs! It’s not often you get to trot that one out :wink:

So I’m building dimensions atop my Vault, and I want to build PiTs for each hub that’s going into the Information Layer, but reading the DV 2.0 book has left me with a clear concept but little clarity on actual implementation.

  1. He keeps referencing using the Hkeys and it makes no sense to me (p155). Why wouldn’t you want the Hdiffs from each satellite in the pit instead of a bunch of Hkeys?

  2. How do you actually build and maintain these things in Snowflake? I have a mandate of 2 years of info layer, what is a basic creation template?

  3. What is the recommended maintenance method for rolling off old records?

  4. I had planned to have my info layer be a view of the PIT, but there are some derived attributes that need to be created somewhere. Is it wise to store these in the PiTs? It would make the view simple obviously, but need some advice on that.

Thank you all, this community is a breath of fresh air!

The entire series and your questions answered here, https://www.snowflake.com/blog/authors/patrick-cuba/

I can’t do better than refer you to Patrick’s articles. A PIT is a helper table that is only produced when you need one - where you have multiple sats off a hub or link and you need to align retrieval of data to a reporting calendar. The PIT is a pointer table - it tells you which records in a set of satellites are the correct ones for a given date. The PIT contains the primary keys of the satellites - the pk is the hash key + the load datetime. You use the PIT to join to the satellite to retrieve the payload columns you want to read. Patrick discusses the SQL to load the PIT and roll it forward. You might consider derived attributes to be Business vault components, derived by business rules and held in the Data Vault for use in reporting.