When insert in parent table?

Arthur · 15 June 2022 13:43

Hello,

We can read concerning link loading in The Data Vault Guru that:

Has the staged source file of the link table been used to supply content to parent hub tables?
Yes

Should we understand that when a link is inserted we also have to insert (if not already present) all related hubs?
More generally, should we always insert in parent table (hub or link) when we insert in a child table (link or satellite)?

Thanks

patrickcuba · 15 June 2022 16:16

Hello

To keep architecture implementations simple you should have landing, staging, loading

DV loading comes from what was staged
If a landed file contains a relationship between business objects then you will use

a link loader and
=2 hub loaders

Hubs are parent tables of a link or satellite

FrenkLoonen · 16 June 2022 07:31

I was wondering the same myself.

What if you have a database source that enforces referential integrity, from which you do a direct pull?
I have table Customer_Core with CustomerID which gets loaded to Hub_Customer and Sat_Customer_Core.

Then I have table Customer_Additional with CustomerID which gets loaded to Sat_Customer_Additional. CustomerID in this table cannot exist if it is not in Customer_Core. Should I still load Hub_Customer from this table?

I would say yes. Load from all possible occurrences in all sources. You’ll get a hit on performance but since you’re loading a hub which will probably never be that big, the performance hit isn’t significant.

Arthur · 16 June 2022 09:08

Hello,
Thanks for the reply.
The special case I was thinking about is when we load link and we only have the BK of a referenced hub in the staged table.
I note that we always insert in parent table even if there is no descriptive data to add in a satellite.

patrickcuba · 16 June 2022 15:04

What performance hit?

Hub, link and Sat loaders are autonomous tasks running in parallel against very thin tables.

FrenkLoonen · 16 June 2022 15:18

Having to do a

SELECT BK
FROM src_table_1

as opposed to doing a

SELECT BK
FROM src_table_1
UNION
SELECT BK
FROM src_table_2
UNION
SELECT BK
FROM src_table_3
etc

patrickcuba · 16 June 2022 15:40

That is a dbt limitation, not a DV limitation

FrenkLoonen · 16 June 2022 16:26

Well it’s not a dbtvault limitation actually, they support the multi-source hub pattern. So no problems there. It’s just that the select with the unions will be slower, hence the performance hit I was referring to.

patrickcuba · 16 June 2022 16:38

It is a dbt limitation

Dbt doesn’t allow for multiple sql files to load to the same model, one file one model hence why dbtvault resorted to creating a Union all

Ideal DV automation is to have single purpose hub loaders, 1-1 src to tgt, so you don’t read srcs you don’t need when loading a hub.

FrenkLoonen · 16 June 2022 16:50

Ah right, yes, sorry for missing that point

And I guess the single purpose hub loaders you mention is in reference to what you describe HERE related to Snowflake?

patrickcuba · 16 June 2022 16:53

Absolutely,
Autonomous, independent, micro service style architecture that makes it easy to recover from failure - follows the Unix mantra

Topic		Replies	Views
ETL pattern for Stage->DV Data Vault 2.0	6	266	22 May 2024
For JSON Source if there is a LINK why need HUB and SAT ? Data Vault 2.0 business-key , link , hub	3	349	6 February 2023
Initial Loading of Link Tables from Staging Data Vault 2.0 patricksworks	8	1410	24 April 2024
Should HUB be loaded only from actual source and how to define- biz. key if 2 source have diff. keys Data Vault 2.0 dbtvault , business-key , link , hub , datamodelling , satellite	5	133	3 July 2024
How can I control the referential integrity in Link and satellite during parallel loading? Data Vault 2.0	6	187	15 August 2024

When insert in parent table?

Related topics