Reason for Record Source Column in Link tables

This may be a Data Vault 2.0 (101) question that I haven’t found an answer to. What is the record source column used for in links when only hubs can be used in them and the hubs have the record source info?

Thank you

Clay

Hi, Clay. The hubs capture the first record source that provides a given key. Links also only capture the first record source for a given set of keys, but the source for the link is the source for the relationship between the hubs involved as opposed to the source for just the hub. These may be the same, but are not necessarily.

2 Likes

Thank you Christopher.
Clay

It’s also important to understand here that we do not load any raw vault tables from other raw vault tables i.e our link does not directly depend on the hub, only the raw source tables/staging tables in the warehouse. This is so we do not create dependencies and this provides parallel load capabilities.

We load, in parallel, from the same sources (usually) as the source of the data for the concepts we are representing in our hubs.

Therefore, we must also include the record source in the link the same way we do in the hub. This is true for all Raw Vault tables.

Record Source and Load Date are required columns across every raw vault structure for this reason and to retain audit everywhere.

1 Like

Thank you Alex,
Clay