Effect on link structure if source table changes structure

Hello All,

I have been thinking about this?

table 1 - car (car_id)

table 2 - customer (cust_id)

table 3 - salesinvoice (inv_id(pk),card_id(fk),cust_id(fk))

When i generate data vault model on salesinvoice table i got Link with below structure

link_hkey

inv_hkey

car_hkey

cust_hkey

ldts

rsrc

Now my source salesinvoice table changed its structure. it got one more foreign key called salesperson_id

table 3 - salesinvoice (inv_id(pk),card_id(fk),cust_id(fk),salesperson_id(fk))

Now how this change will effect on link table.

Do we need to create new link in this scenario with new structure by including new salesperson hash key as below or already existing link gets this change ?

link_hkey

inv_hkey

car_hkey

cust_hkey

salesperson_hkey

ldts

rsrc

If yes how can we add that to existing link which already has data ?

Thanks in advance :slightly_smiling_face:

The business rule has changed = new link table

2 Likes

Thanks for your reply @patrickcuba.

The best point about DV i seen is easy adaptability which is one of the main reason for organizations to choose Data vault . Data vault model adopts changes easily if any new source structure introduced or any frequent changes in the source structure by simply adding new objects to the model.

But,

in the above scenario, imagine organizations have around 50 sources and has data in giga bites. in such case adding new link and loading huge data is costly operation. and what if same kind of change happens again on the same source table (got another foreign key). another new link need to be created with correct grain and load all the data from source. so in future there may be multiple versions of link tables.

it becomes difficult in reporting/BI layer on joining all the tables for data analysis right ?

How can we handle this type of scenarios ?

Easy adaptability is that if any new source table added or descriptive columns added then we can simply create new satellite table with new columns and hang to respective existing hub table. but in case of hub or link, we cannot do that simply.

Patrick is right that there needs to be a new link table.

The question is whether that link table needs to be fully back-populated or not. That would be determined primarily by whether the source system had been back populated.

If it hasn’t been back-populated, you could start the new link table from the date of the first salesperson_id record (i.e. the business rule change date). You can then stop updating your original link table, and any downstream models would get switched over to point to a union of the original and new link tables (with a NULL in place of salesperson in the original link).

If it has been back-populated, then you would want to bring in all of the history and do a full load into your new link table. How you handle downstream models (simply repoint, or perform a union with cut-over dates) is a business decision.

I wouldn’t recommend pointing your dashboarding/reporting tool directly at the hubs/links/sats. Rather I would consider these the Silver layer in Medallion architecture, with an expectation of a Gold layer over the top that the business would report off. The Gold layer contains the downstream models I referred to above.

1 Like