I need some advice regarding best practices in hub/satellite modelisation.
I have 2 hub :
hub_employee and hub_company
I have a source that give me data about employee AND company (a bool on each line say if the data is personal => about the employee or ‘generic’ => about the company). Source :
firstName
famiyName
mail
phone
Company_ID
When it’s generic, names are null.
If I understand the best practice well, I should have one satellite per source. But if I put the satellite on the the hub_company, I loose information about which employee the data defines (as the relation is one to many).
Should I consider the source as 2 sources and split the satellite on the two hubs (company and employee)? Is it not a bad practice? Or should I consider another way to do it?
So e.g. you may get the mail address of the company or the mail address of the employee depending on the boolean?
If so, you can add the company info to a sat off hub_company and the employee data off hub_employee. prefilter in staging. You can have whatever numbers of satellites from one source table if they mean different things.
Is this also a link source for employee to company? Feels like it. Don’t miss this part out.
Exactly! The same is true for the phone (can be the professional phone of the employee or the generic phone of the company).
I am a little bit relieved, it’s what seems logical to me and what I have done (2 hubs, 2 satellites and 1 link).
I am quite new to the data vault modelisation, and it seems that a satellite per source was a strong rule. I wanted to avoid rookie mistake if there is a strong reason for a strict one source => one satellite.
Agree with Nat. Use the same source to stage into two Hub- Sat’s and Link, if needed.
Incidentally, how does your source generate unique BK’s for Employee and Company, pls?
Seems like they would have different business domains i.e. EmployeeId is usually a Number (system-generated at HR-Onboarding) whereas CompanyID could be its ABN/ACN? - or some other way?