We receive transactional data from the source. It’s a big table (not that many columns but many rows) and the transactional value for the last months could be updated from the source.
Since the source data could change, we have so far mocked it as normal link and link satellite. However, this means our PIT and Record Source Tracking Satellites tables grow very fast.
In my understanding, Non-historized links are usually used to represent a fact table, but only for immutable data.
So my question is: Is there any clever method to mock the changeable transactional data as Non-historized link? Or we can’t avoid using normal link satellite and having giant PIT/RTS?
Many thanks in advance!
This can be done. Depends on how the mutable data is sent. When you get an update, do you get a batch for the entire day/period or do you only get the updated rows from the source?
Cool! Thanks AHenning
We get a batch for all the data in the previous 6 months window. Data coming before the 6 month are considered as not changeable anymore.
I would go for the link and the link satellite in your case. Make sure your RTS only tracks the change in existance of a bk. Remember that you only need one RTS. The pit is only needed for performance reasons, is it vital in your case?
We haven’t built the information mart yet, so we don’t have the performance issue there. However, since it’s a large table with frequent updates, I assume that we could have performance issue without PIT.
Do you mean that as long as we receive a snapshot for the whole batch, we need to mock it as normal link and link satellite; and if we only receive delta, then we can mock it as NHL?
Basically yes. As long as “deltas” is for a fixed period, lets says all transactions from yesterday.
Are you going to load the mart with data or will the mart be views from raw-vault? If it is going to be loaded I would hesitate with the pits until performance issues have arisen.
Not quite — nh-links are for immutable source loads that are presented in near real time.
What you have mocked up is correct, however, why would your PIT be huge?
Consider logarithmic PIT structures with managed windows. If you need to report on data for quarter end, or even business day-fridays then your you’d have two PITs - quarter-end PIT and Friday-PIT with only the keys for that snapshot period. Your PITs are disposable, you can recreate at will and never lose the auditability of the data vault.
What’s your use case for RTS? Interesting, because we don’t see many customers using it unless for a very specific use case — they tend to be long and thin and thus not really a storage concern. That said, what platform are using for your DV?