Multi-table insert from staging to raw on Snowflake

Hello everyone!
I’ve seen a number of sample articles relating to near-real time integration on a Snowflake platform. One article advocates using a combo of staging tables, staging metadata views and streams as a basis to load a target raw vault hub/link/sat table using a multi-table insert embedded within a single task.

example:
https://quickstarts.snowflake.com/guide/vhol_data_vault/index.html?index=..%2F..index#0

My question is whether having a single task to do this is prudent or whether creating separate loader tasks for each target hub/link/sat is better. Given that multiple tasks will likely consume more compute than a single one, on a consumption-based platform such as Snowflake it is an important consideration, but not one that trumps potential data duplicates or other issues in the target tables.

I’m trying to get my head around @patrickcuba 's article on MTIs here:

…which suggests being careful about using MTIs. Is this only an issue where you may have multiple when clauses hitting the same target hub? What about if you have multiple staged tables that may be the source of a business key with their own hub loader?

Should I essentially ignore the Snowflake quickstart guide on this specific matter of getting data from staging into hubs/links/sats in parallel?

Using MTI for DV is dangerous, see here: Data Vault Loader Traps. You might have already had to deal with… | by Patrick Cuba | Medium

The article pretty much dispels many of the techniques that quick start advocates for, it was published BEFORE I joined Snowflake.

Thanks Patrick - separate loaders it is for the raw vault load.

Is there anything else relating to the staging table that is problematic on Snowflake? Is leveraging a view (containing the DV metadata) on top of the staging table’s Stream object ok? Or is it better to persist the DV metadata within the staging table as separate loaders for hub, links sats would need to compute hashkeys etc each time the view is executed per loader.

Streams on Views - do that instead. Introduction to Streams | Snowflake Documentation

And yes, that’s a framework I have promoted here: Data Vault on Snowflake: Streams & Tasks on Views | by Patrick Cuba | Medium

And here: Kappa Vault. This article expands on “Data Vault… | by Patrick Cuba | Medium

1 Like