Tracking number of INSERTs into HUB, LINK and Satellite

saqib · 26 January 2022 02:25

Hello,

We have audit requirement to capture the INSERT for each HUB, LINK and SAT. Currently I have implemented a python wrapper that performs the update of the DV as one single Snowflake transaction. This python wrapper also logs (into AWS Cloudwatch) the number of records (as returned by SQL INSERT) INSERTed.

I am wondering if there is a better approach to capture and retain this metric for each HUB, SAT, LINK?

Thanks,
Saqib

will.list · 26 January 2022 14:44

I’m not on Snowflake/AWS, but from a technology-agnostic standpoint, that sounds like what needs to be done. In order to capture the number of rows inserted, you need something that can return the number of rows inserted, and then something to log that.
So long as your logging can deal with rolled back transactions, and is logging individual tables rather than multi-table transactions, you’ll be fine.

Technology-specific advice welcome.

patrickcuba · 27 January 2022 01:18

You could adopt the approach documented here:

chrisfisher · 28 January 2022 16:17

Have you looked into Snowflake Streams?

These are change tracking/change data capture tables that track any DML changes to the table (inserts, updates, deletes, etc).

Specifically for inserts you could use an append-only stream which just tracks the inserts to standard tables in Snowflake. My thinking is that after each load of a table you could query the table’s stream to retrieve the number of inserts for that table using Python and then log it into AWS Cloudwatch.

A point to note is that Streams don’t retain a history of changes. To build a history you could you write a task that queries the stream and stages the result, which is then loaded into a change history table. This table could then be used in downstream analytics etc.

For some more information on Snowflake Streams, take a look at the documentation: Change Tracking Using Table Streams — Snowflake Documentation

I hope this is of some help to you!

neil.strange · 28 January 2022 16:38

As each target table has a Source and Load_Datetime column, and we run an insert only loading pattern, you can also count the inserts as a sweep SQL script after the load (select sum from table where ldts = last load ldts). A further benefit of this is you can then use the results of the counts as a new staging table (insert into stg_load_stats as select sum…) and load that data into a hub/nhl - a meta vault, a vault table holding the metadata about load volumes.

saqib · 28 January 2022 17:46

Hi @neil.strange . This is a good a approach, but it won’t capture if there we no INSERTs because there was no new data or if there was a failure or the INSERT script didn’t even run.

neil.strange · 29 January 2022 18:05

If there were no inserts you’d see a zero count. You can also use a record tracking satellite or status tracking satellite to log every feed and status change. And finally, you can log every batch run in a waterlevel table holding the last successful load date run for each table and feed.

Topic		Replies	Views
Metrics vault on Snowflake Data Vault 2.0	4	63	16 February 2025
Handling Delete and Update in Insert only architecture Data Vault 2.0 dv-architecture	1	362	8 April 2024
Implementing Data Vault 2.0 with AutomateDV for Change Logs Data Vault 2.0	1	259	8 May 2024
Multi-table insert from staging to raw on Snowflake Data Vault 2.0	3	63	4 February 2025
Dropping historical entries in a computed satellite? Data Vault 2.0 business-vault	5	922	7 November 2023

Tracking number of INSERTs into HUB, LINK and Satellite

Related topics