We have the source in a 3NF with surrogate keys.
Sample Tables - > Employee (EID, EmpID, Name), Department (DID, DeptID, Name), Salary (EID, DID, Salary)
As per DV 2.0 principle, only actual business keys should be used in Natural Keys, this translate into
- hub_employee(h_emp_hash, empID) → This can be populated directly from Employee source table
- hub_department(h_dept_hash, deptID) → This can be populated directly from Department source table
3. link_employee_department(l_emp_dept_has,h_emp_hash, h_dept_hash) → The source for this would be salary table, which only has the surrogate keys of the parent tables. When i load the link, i need to get the natural keys by joining the respective parent tables from source?
There’s a lot more going on here…
- Get the content in, denormalised if you can
- Don’t enforce joins between pulled/pushed landed content, if you can
- Read up a bit on the process of DV modelling to understand that your question, is probably answered by reading up a bit! If we give verbose advice here to a topic like this it sounds like there are many importnt lessons not understood yet! Without at least some sort of foundation you would be adding doezens of topics to this forum! DV is a discipline
This is a really good question. Most dv folks will say that the employee table and the salary table must be joined so that the salary data can be delivered with empID and DeptID. This creates some challenges: Data processing is now dependent on three deliveries or joins has to be performed by the sending system.
An alternative(my opinion and not dv standard) would be to model both the identifiers as hubs from the source. The model becomes more extensive but all deliveries and processing can be done independent. No relations between entities will be lost.