Multi-table insert from staging to raw on Snowflake

jhall_uk · 2 February 2025 22:07

Hello everyone!
I’ve seen a number of sample articles relating to near-real time integration on a Snowflake platform. One article advocates using a combo of staging tables, staging metadata views and streams as a basis to load a target raw vault hub/link/sat table using a multi-table insert embedded within a single task.

example:
https://quickstarts.snowflake.com/guide/vhol_data_vault/index.html?index=..%2F..index#0

My question is whether having a single task to do this is prudent or whether creating separate loader tasks for each target hub/link/sat is better. Given that multiple tasks will likely consume more compute than a single one, on a consumption-based platform such as Snowflake it is an important consideration, but not one that trumps potential data duplicates or other issues in the target tables.

I’m trying to get my head around @patrickcuba 's article on MTIs here:

…which suggests being careful about using MTIs. Is this only an issue where you may have multiple when clauses hitting the same target hub? What about if you have multiple staged tables that may be the source of a business key with their own hub loader?

Should I essentially ignore the Snowflake quickstart guide on this specific matter of getting data from staging into hubs/links/sats in parallel?

patrickcuba · 2 February 2025 22:25

Using MTI for DV is dangerous, see here: Data Vault Loader Traps. You might have already had to deal with… | by Patrick Cuba | Medium

The article pretty much dispels many of the techniques that quick start advocates for, it was published BEFORE I joined Snowflake.

jhall_uk · 4 February 2025 20:34

Thanks Patrick - separate loaders it is for the raw vault load.

Is there anything else relating to the staging table that is problematic on Snowflake? Is leveraging a view (containing the DV metadata) on top of the staging table’s Stream object ok? Or is it better to persist the DV metadata within the staging table as separate loaders for hub, links sats would need to compute hashkeys etc each time the view is executed per loader.

patrickcuba · 4 February 2025 23:00

Streams on Views - do that instead. Introduction to Streams | Snowflake Documentation

And yes, that’s a framework I have promoted here: Data Vault on Snowflake: Streams & Tasks on Views | by Patrick Cuba | Medium

And here: Kappa Vault. This article expands on “Data Vault… | by Patrick Cuba | Medium

Topic		Replies	Views
Optimizing SAT Loading from Staging as a VIEW in Snowflake Data Vault 2.0	15	505	5 August 2022
Metrics vault on Snowflake Data Vault 2.0	4	43	16 February 2025
Qlik Data Movement as source to Snowflake Data Vault 2.0	0	26	2 February 2025
PIT table for Realtime Data Data Vault 2.0	1	192	4 November 2022
ETL pattern for Stage->DV Data Vault 2.0	6	233	22 May 2024

Multi-table insert from staging to raw on Snowflake

Related topics