Hello everyone!
I’ve seen a number of sample articles relating to near-real time integration on a Snowflake platform. One article advocates using a combo of staging tables, staging metadata views and streams as a basis to load a target raw vault hub/link/sat table using a multi-table insert embedded within a single task.
example:
https://quickstarts.snowflake.com/guide/vhol_data_vault/index.html?index=..%2F..index#0
My question is whether having a single task to do this is prudent or whether creating separate loader tasks for each target hub/link/sat is better. Given that multiple tasks will likely consume more compute than a single one, on a consumption-based platform such as Snowflake it is an important consideration, but not one that trumps potential data duplicates or other issues in the target tables.
I’m trying to get my head around @patrickcuba 's article on MTIs here:
…which suggests being careful about using MTIs. Is this only an issue where you may have multiple when clauses hitting the same target hub? What about if you have multiple staged tables that may be the source of a business key with their own hub loader?
Should I essentially ignore the Snowflake quickstart guide on this specific matter of getting data from staging into hubs/links/sats in parallel?