Refactoring of business vault

DVBeginner · 31 July 2023 08:00

Hello,

We have built Data Vault for Offer and PrintOut and pretty much virtualized BV components (see attached diagram).

As there are quite a few dependencies, data volumes significant and solution is deployed on the on-prem SQL Server box, we experience poor performance. Additionally as the business logic is based on the latest entries in raw satellites, BV is not auditable and shows only latest calculated values. The advantage of the solution is that code is implemented once and is reusable.
Here is the current sequence of steps and dependencies in the current pipeline

Raw Vault – in parallel

Load Offer_Hub, Offer_Sats from raw sources

Virtual BV (SQL Views)
1a Calculate Offer_Tags based on Offer raw vault satellites
1b Calculate Offer_PrintOuts based on PrintOut raw vault
2a Calculate Offer_Params based on Offer_Tags
Expose Offer data in consumption layer (as virtual Offer DIM)

We considered adding PITs and Business Satellite to accelerate the pipeline and add audit to BV, but we struggle with additional dependencies that appear and questions:

should we create PITs for raw vault and use them to accelerate build out of business satellites?
should we create another layer of PIT to accelerate DIM?
how to shape business satellites? Per topic, 1 broad satellite? should the simple attributes ( COALESCE(sat1.Att1, sat2.Att2) ) be included in business satellite as well?

I’d appreciate your feedback on above.

Regards,
Marcin

patrickcuba · 1 August 2023 03:46

What is PrintOut? Is that a business object with an immutable business key?
Why not load Business Vault sats be reusing the same loaders for Raw Vault sats?
PIT tables can have many variations

Current PIT to point to the Active record per business key — you might test this configuration if using the join between Sat and C-PIT give you that performance index
PITs are tables and the default implementation is to build information marts as views utilizing this PIT structure

DVBeginner · 1 August 2023 07:58

PrintOut stores information about printed out documents. It has it’s own unique key and is loaded from separate system, it may be linked to offers, contracts and other objects.
We can reuse Sat loading template, though we are not sure how to tackle the levels of dependencies, how to shape business satellites (1 bus sat with all calculated satellites or split by area?). Do we build PIT on top of RAW sats to accelerate calculation of business sats and then another PIT for business sat or maybe 1 PIT for both raw and business sats?
Yes, we are still looking to virtualize InfoMarts on top of PITs. CPIT looks like a good fit for near-realtime pipeline we develop as well, but not so much for the ones where we need to keep snaphots.

patrickcuba · 2 August 2023 01:40

would be your discretion – like for your analytics what would work best and then further should the BV sat follow a framework that is a pivot of the content… or even if the output should be stored in a semi-structured column because the metrics may change. Now a PIT’s job is to get the data needed quickly, the downside is you need to build a PIT. However there are ways you may consider to populate a PIT to do this efficiently, i.e. build it in parallel to the loading of sats so you effectively only push the needed keys and load dates to the target PIT (and probably either UPDATE the load date value based on the parent key) or if you’re on Snowflake, deploy a stream on top of the sats and process only new content for the PIT. PITs are helper tables, if you need one to help you load a BV SAT then so be it!
Snapshot PITs at a logarithmic cadence can be driven by your as_of table with business flags, like how you would develop a calendar-dimension for Star Schemas, right?

Topic		Replies	Views
Virtualize or persist data in business satellites? Data Vault 2.0	1	356	16 April 2023
Clarification on what satellites are involved in a PIT table Data Vault 2.0	2	623	19 July 2022
Business vault sat PITs Data Vault 2.0	3	77	7 August 2024
Need some real/sample examples of Information Mart using Raw and Business vault Data Vault 2.0 dbtvault , business-vault	10	1089	1 July 2024
Computation in DataVault Data Vault 2.0	3	43	19 July 2025

Refactoring of business vault

Related topics