I am interested to learn more about how people are using “record source” in their Data Vaults when they stage the data.
Do you use it to denote:
- Ultimate source application
- Ultimate source application and domain area
- Ultimate source application and table name
- Ultimate source application, table name and unique record identifier (system primary key or row id)
- Ultimate source application and screen name
- Something else?
What about the mode of transport, say if the record is delivered in File xyz.csv - do you capture the file and record of the file instead of, or as well as the source app details?
Absolutely, it could even be 1-1 tiwth the partition_by column on parquet
imho you could have
– full_path, file_name, file_type
– api name
– business rule name and version, which could be simply what the file name is if you save business rule outcomes under a filename that is 1-1 as the business rule name!
Whatever you deem necessary to tell you where that record came from!
Keen to hear other variations too!
I’m currently in Azure SQLMI with D365 as the data source channelled in near real time using Azure Synapse Link (append mode)
My strategy is:
- Record Source is the full name of the csv file (ADLS Account url/Container/entityName/fileName
- I have added Record Number to the Satellites which is the the line of the CSV file where the source record is found
- I have a table for logging the load process in more detail. It contains:
- Byte offset of this record
- Row size in bytes
- Row size in fields - helps with schema drift
with the LDTS I can find the code version in the code repo.