Hi everybody!
I need some advice regarding best practices in hub column naming convention - especially for multiple columns representing a business key.
My project situation:
I have source tables providing business keys for a business object “Contract”.
The some tables provide the business key as a single column (e.g. contract_uuid) as well as in two columns (e.g. company + contract_number).
contract_uuid is a technical id, but very important because all others data is connected to it.
company + contract_number is the natural business identifier mainly used by business.
My Source:
SRC_TABLE_1
contract_uuid (contract_bk_1)
company (contract_bk_2.1)
contract_id (contract_bk_2.2)
Therefore I modeled two hubs. One for single column contract business keys, one for two columns contract business keys.
Also I modeled a same-as link to connect both hubs.
I have choosen generic names because it is very likely that there will be more business keys (single columns as well as multiple columns) coming from different source systems or tables. I want to store them in the same hubs to create a lightweighted model
My DV Model:
HUB_CONTRACT
hk_contract (hashkey: hash(contract_uuid))
id_contract (business key : contract_uuid from SRC_TABLE_1)
HUB_CONTRACT_2
hk_contract_2 (hashkey: hash(company+contract_id))
id_contract_1 (business key : company from SRC_TABLE_1 )
id_contract_2 (business key : contract_id from SRC_TABLE_1)
LNK_SA_CONTRACT__CONTRACT_2
hk_contract__contract_2
hk_contract
hk_contract_2
The challenges for me and the consumers are now:
Because of my generic names the model is hard to read:
- What is HUB_CONTRACT_2 ?
- What values are in id_contract_1 or id_contract_2 ?
It even gets more complex, if my very likely second or third source system comes into play providing more single and multiple column keys.
For example a natural identifier will be column IKER + OPSE. I plan to store them as well into HUB_CONTRACT_2.
- How does a consumer now only select records from HUB_CONTRACT_2, which id_contract_1 == company ?
The only way I see to solve this is to create multiple hubs.
In my case HUB_CONTRACT_2 would be renamed on entity level as well as its columns would be renamed:
HUB_company_contract
hk_company_contract (hashkey: hash(company+contract_id))
id_company (business key : company from SRC_TABLE_1 )
id_contract (business key : contract_id from SRC_TABLE_1)
… as well as the link …
LNK_SA_contract__company_contract
hk_contract__company_contract
hk_contract
hk_company_contract
But this solution has the big drawback that my model increases by unnecessary links and hubs … more joining at the end.
What is your design pattern or best solution for this situation ?
Are you using general names for columns and use one hub for each grain of natural keys ? Are you opening new Hubs with specific names ?
Many thanks for any advice !