I’ve just completed the training, but I’m still stuck. Could someone please point me in the direction of some example SQL to create a PIT table, sourced from a HUB that uses a SAL?
We have a HUB for customers. We have multiple disparate systems, with different business keys. We have a table that holds the relationships between the various business keys, but not all keys exist in this table as not all keys have a relationship (yet).
We want to build a PIT table, and be able to see the SAT from system A, aligned to the SATs of systems B, C & D, even though the business keys are different.
I understand the concept of the SAL, and that it can only have two columns - parent & child - so for a business key that had relationships across all 4 systems, maybe the first question is ‘would I expect to see 3 records - A-B, B-C, C-D - in the same SAL, or would I have multiple SALs - one for each system-system relationship?’
Following from that, in either scenario, there would be business keys left in my hub that aren’t referenced in the SAL(s). So what would a query look like that built things up ready for the PIT table?
Any help would be greatly appreciated.
What does the incoming source data look like?
One source is event data, coming through as a json payload. It has a UUID key.
Second source is from a MS SQL based system. It has an incremental integer key.
Third source is Salesforce, synced with FiveTran. It has a case sensitive VARCHAR key.
If the original record started in the first system, it should have also found its way into the third, and Salesforce has an external id column to capture the UUID.
If the original record started in the first system after 1st July and was related to a specific business unit, it should have also synced to the second system. The second system will have the Salesforce id in an external field (as it will have gone via Salesforce).
If the record originated on Salesforce, it will only reside on Salesforce.
If the record originates on the second system, it may only reside on that system, or it may have synced to Salesforce (based on a business rule). If it was synced, Salesforce has the incremental id saved under the ‘name’ field. I’d note that if the record in Salesforce didn’t come from the second system, the ‘name’ field will have a text string in it that is useful to the business, so there is a bit of logic required here (but I don’t want to get caught up on that in this thread).
There are other systems, but hopefully that gives you an idea? We don’t have an MDM solution, and are not trying to make DV and MDM solution. However where a relationship can clearly be identified from on of the sources, we need to store that in a SAL, and then when the business is asking for a Type2 SCD in the datamart, we need to be able to pull this all together into a PIT table.
Does that answer your question?
- You need to decide what you want to surface into the PIT. I assume the SAL is mapping between business keys and other surrogate keys and you want to expose the business key. You’ll need to resolve that SAL and use the outcome as a base for your PIT construction.
- I also assume the complexity is around the fact that your sats are based on surrogate keys or a mix of them. Remember, the PIT table has “pointers” to those satellite tables, therefore your PIT will be a construct that holds the business key and the relevant satellite hashkeys to those sats.
In short, you’re going to need an interim step to serve as a base for your PIT, or, if you can, resolve this complexity in your PIT code. I published a sample PIT code here: https://www.snowflake.com/blog/dynamic-tables-data-vault/
The code breaks up the components needed for the PIT into CTEs, you could essentially resolve the complexity within one of the CTEs