I am interested to learn more about how people are using “record source” in their Data Vaults when they stage the data.
Do you use it to denote:
Ultimate source application
Ultimate source application and domain area
Ultimate source application and table name
Ultimate source application, table name and unique record identifier (system primary key or row id)
Ultimate source application and screen name
Something else?
What about the mode of transport, say if the record is delivered in File xyz.csv - do you capture the file and record of the file instead of, or as well as the source app details?
Absolutely, it could even be 1-1 tiwth the partition_by column on parquet
imho you could have
– full_path, file_name, file_type
– api name
– business rule name and version, which could be simply what the file name is if you save business rule outcomes under a filename that is 1-1 as the business rule name!
Whatever you deem necessary to tell you where that record came from!