ETL pattern for Stage->DV

Hello, dear community!
I’ve been figuring out for my self the best approach for loading pattern of the DV core objects from staging tables.

Lets us having N staging tables from M source systems.
What is the best approach to load hubs, link and stas regarding options I can think about:

  1. Treat each stage table as and atominc pipeline, the one that is responsible to load all the core objects that depend on this stage table. So the processes are decoupled and can be sceduled at any rate of stage table refrsh.

  2. Move data layer by layer. From landing to staging and then to core ONLY WHEN ALL the staging is done. In such case we have piplines per hub, sat or link that may address several stage tables.

p.1 seems better for me as we can have different load rates depending on busness area (domain), as opposed to p.2 where we need wait for all domain data to arrive.
(for my project I see different business areas refreh at different rates)

or may be I’m looking a wrong way completely?
Thank You!

Yup … each landed content should be its own ‘universe’ and load independently load its mapped hubs, links and sats.

1 Like

Great thanks for your reply!
But having said this, what if my source table data is insufficient to fill a link table?
For ex. I have a sale in retail store (so linked huns are store, pruduct, customer)
System 1 has info only for store and product in a transaction and customer can be sourced only from system 2 for this transaction.
For instance source 1 is pos system and system 2 is customer loyality which integrates with system 1 by cash receipts to bring the customer ID

If there is no relation with customer exist in System1, you cannot create the link that links store, product and customer hubs .


What @mrcool4 said… you cannot record an interaction if it did not happen

1 Like

But won’t I end up with bulding source system vault in this case? :slight_smile: If I follow the way the source dictates?
This link follows business ontology (even a simple one as a pos sale)

Nope, ssdv refers to creating hubs for every code and key you see and does not reflect the business architecture