We have built Data Vault for Offer and PrintOut and pretty much virtualized BV components (see attached diagram).
As there are quite a few dependencies, data volumes significant and solution is deployed on the on-prem SQL Server box, we experience poor performance. Additionally as the business logic is based on the latest entries in raw satellites, BV is not auditable and shows only latest calculated values. The advantage of the solution is that code is implemented once and is reusable.
Here is the current sequence of steps and dependencies in the current pipeline
- Raw Vault – in parallel
- Load Offer_Hub, Offer_Sats from raw sources
- Virtual BV (SQL Views)
1a Calculate Offer_Tags based on Offer raw vault satellites
1b Calculate Offer_PrintOuts based on PrintOut raw vault
2a Calculate Offer_Params based on Offer_Tags
- Expose Offer data in consumption layer (as virtual Offer DIM)
We considered adding PITs and Business Satellite to accelerate the pipeline and add audit to BV, but we struggle with additional dependencies that appear and questions:
- should we create PITs for raw vault and use them to accelerate build out of business satellites?
- should we create another layer of PIT to accelerate DIM?
- how to shape business satellites? Per topic, 1 broad satellite? should the simple attributes ( COALESCE(sat1.Att1, sat2.Att2) ) be included in business satellite as well?
I’d appreciate your feedback on above.