ETL pattern for Stage->DV

KostyaRico · 21 April 2024 22:10

Hello, dear community!
I’ve been figuring out for my self the best approach for loading pattern of the DV core objects from staging tables.

Lets us having N staging tables from M source systems.
What is the best approach to load hubs, link and stas regarding options I can think about:

Treat each stage table as and atominc pipeline, the one that is responsible to load all the core objects that depend on this stage table. So the processes are decoupled and can be sceduled at any rate of stage table refrsh.
Move data layer by layer. From landing to staging and then to core ONLY WHEN ALL the staging is done. In such case we have piplines per hub, sat or link that may address several stage tables.

p.1 seems better for me as we can have different load rates depending on busness area (domain), as opposed to p.2 where we need wait for all domain data to arrive.
(for my project I see different business areas refreh at different rates)

or may be I’m looking a wrong way completely?
Thank You!

patrickcuba · 20 May 2024 07:39

Yup … each landed content should be its own ‘universe’ and load independently load its mapped hubs, links and sats.

KostyaRico · 21 May 2024 16:00

Great thanks for your reply!
But having said this, what if my source table data is insufficient to fill a link table?
For ex. I have a sale in retail store (so linked huns are store, pruduct, customer)
System 1 has info only for store and product in a transaction and customer can be sourced only from system 2 for this transaction.
For instance source 1 is pos system and system 2 is customer loyality which integrates with system 1 by cash receipts to bring the customer ID

mrcool4 · 21 May 2024 22:55

If there is no relation with customer exist in System1, you cannot create the link that links store, product and customer hubs .

patrickcuba · 21 May 2024 23:05

What @mrcool4 said… you cannot record an interaction if it did not happen

KostyaRico · 22 May 2024 08:11

But won’t I end up with bulding source system vault in this case? If I follow the way the source dictates?
This link follows business ontology (even a simple one as a pos sale)

patrickcuba · 22 May 2024 21:12

Nope, ssdv refers to creating hubs for every code and key you see and does not reflect the business architecture

Topic		Replies	Views
Initial Loading of Link Tables from Staging Data Vault 2.0 patricksworks	8	1450	24 April 2024
Does stage table gets all the data in every load? Data Vault 2.0 business-key	2	311	27 April 2023
When insert in parent table? Data Vault 2.0	10	277	16 June 2022
How to parallelize hubs and links loads and automatize them as much as possible ? Data Vault 2.0 link	11	1291	27 March 2023
Resolving joins in staging Data Vault 2.0 loading , raw , satellite	1	356	7 February 2024

ETL pattern for Stage->DV

Related topics