Need some real/sample examples of Information Mart using Raw and Business vault

Hi,

I need real examples of converting a raw data vault into Information Mart (star schema). All the examples I found on the internet are too simple. I need to know the following.

  1. Using PIT and Satellites to create a dimension
  2. Using Bridge and (or Links) tables to create fact tables.
  3. How multiple hubs’ and their satellites (with or without PIT) can be used to create a dimension (e.g., Category and Product)
  4. How staging and CDC can be used in an effective way to achieve all of the above.
  5. How effectivity and Status based satellites can be used to create these Marts.

Suggest me some good books with chapters on the above things. I don’t mind having Microsoft samples (Northwind, WorldwideImporters, Adventureworks) either. Currently, I don’t find any good books or internet materials on the above things. Maybe I had been searching the wrong way. So please help me find them.

Thanks and Regards,
Irfan

I do agree that there isn’t a huge amount of content out there on the information mart layer possibilities specifically related to data vault (as opposed of course to dimensional modelling in general)

1 Like

If you use auto-increment columns in each of the sats, you can build a Bridge table to simulate the fact table. Upcoming blog I show how to do this, no date when it will come out yet! :slight_smile:

2 Likes

I am waiting for it desperately. :slight_smile:

will this be an evolution of this one?

Yes, you might figure it out before I publish!

Temporal surrogate sequence keys, if you pull those into a Bridge table that includes the metrics needed for your fact table you have facts and dimensions. Reuse the history-views over the other sats as pseudo-dimensions (they have the auto-increment column) and you have a star schema you can slice and dice using a date dimension

1 Like

Thank you, Patrick. I got the idea of using Temporal surrogate keys just to make join faster because of simple integers.
If you can, suggest something on the CDC side. Do you think PIT can/should be CDC based also to cater to multiple satellites for the next run? Similarly, the same can be true for the bridge table for loading facts. What do you think?

CDC based? Incremental PITs?

By CDC, I meant “change data capture.”

Actually, I assume there are two pipelines. The first one is to get everything from the staging/landing zone and take it to the Raw vault and Business vault (PIT and Bridge).

The second one is to get everything from the Raw vault (and Business vault) and take it to the Information Mart (star schema).

It is the second part I am most interested in right now. You suggested Temporal surrogate sequence keys that will help load information mart faster. I understand that part.

My question is whether (and how) cdc (i.e., change data capture) can increase the performance of loading the information mart, considering there will be a continuous loading process in this pipeline.

I know CDC mate. I was wondering in what context wrt to Data Vault.
CDC, Snapshot, Accumulating Snapshot — it don’t matter.

  • Raw Vault = Hub, Link and Satellites (auditable and non-disposable)
  • Business Vault = Link and Satellites (auditable and non-disposable)
  • Query Assistance = PITs and Bridges (disposable)
  • Information Marts = Flat Wide, Views, Star Schema … whatever ((disposable))

Info-Marts being disposable are by default deployed as views until such time the performance is lagging or a better latency is desired. Snowflake’s new Dynamic Tables is a great fit for this area

1 Like

Thanks for the clarifications. Do you recommend that historized data (the different values in satelittes with a start end and end date) should be maintained in the information mart? If so, how do you expose the complexity of point in time logic to the end users?