I am struggling with defining what becomes a Point in Time (PIT) table for a HUB. Is it a combination of a HUB and ALL of the Satellites that relate to it, or do you build one PIT for a HUB and each individual satellite?
Also are PITs ever virtual, or are they always physical tables? Do you only create PITs for performance reasons? Or do you also create PITs for every HUB. For a simple Hub with a couple of small satellites, would you even bother with a PIT table?
Thanks,
Janet
Hi Janet.
A single PIT table is created for a hub and its satellites or for a link and its corresponding satellites. A good example of what a PIT table’s columns looks like for a classic HUB Customer can be seen below.
PIT tables can be either virtualised or materialized. PIT tables however are primarily built to help with query performance in the Business vault so I find it is easier to generate them as physical tables to speed up the query performance. It can be done either way though, so if you find the performance is fine with the PIT’s as views then feel free to keep them that way and only materialise them once there is a performance drop.
PIT tables should not be built by default and are only created when the need arises.
I hope this helps.
Flynn
1 Like
As Flynn said only build one of you need it. Depending on how much data you have and the databse platform you are using you may not need it. If there is only one Sat you do not need it for sure.
Keep in mind that if you add Sats later you may need to add columns to the PIT, or better (IMO) build a new one so you have the old one for audit.
If you are usimg Snowflake you can likely build it with a view. In that case it is more to simplify the queries against the Business Vault, rather than for pure performance reasons.
Check out my book on Agile Data Engineering for an example and sample code.
The book also gives details and an example of a Bridge tanle and when to use that.