I have a system , I have mimicked using Customer Product analogy as in diagram below
Based on this System -C always has subset of customers from System-A and System-B and produces some data sets (50+ in number) for customer_product relationship(s)
The Data Vault model would look like below :
TO pouplate DV we are using the workflow as below:
I have follow question
- The SYSTEM-B has only CUST_ID and SYSTEM-A has CUST_ID and CUST_SRC as biz.keys. I am thinking to design one single HUB_CUSTOMER with some default value in CUST_SRC for SYSTEM-B, is this approach correct?
- AS the source for HUB_CUSTOMER and HUB_PRODUCT are SYSTEM-A and SYSTEM-B ., do I really need to populate CUSTOMER_HUB from SYSTEM-C as it only has Biz.Keys for CUSTOMER_HUB and PRODUCT_HUB for all 50 data sets as that would lead to duplicates if I ran 50+ data sets in parallel in case of DBT workflows with storage as DELTA (no unique key constraint in DB ) , shouldn’t data from SYSTEM-C only populate LINK tables and individual satellites?
ANy help is greatly appreciated