Business subject but no business key

Hello everyone,

I’m seeking advice on modeling my DV.

Our team focuses on fraud and corporate solvency risks. We’ve collaborated with the business team to identify key business objects : company, contract, financial statement…
For these, we’ve successfully established Business Keys.

However, we’re encountering difficulties with the “human” aspect of our data. Specifically, we lack (and likely will continue to lack) definitive BKs for entities such as shareholder, CEO, local elected representative …

We have no control on the multiple data sources, but we possess business logic that allows us to attempt deduplication based on names and other available information

I’m unsure how to integrate this data into the Raw Vault. The examples I found online (using “weak” hub with firstname + lastname + date of birth) seem not right.

Many thanks for any advice!

Hello… is financial statement a business object?

The mastering of business objects you described seems like a task for MDM, somewhere you must create a business key.

Maybe you might draw inspiration from this short story:

  • We were at a customer who managed debt collection from various clients. For example Walmart, or any retail / commercial entity fails to collect payments for the debt that is owed to them (credit card debt etc).
  • All these clients manage their business objects with either account numbers, customer numbers or contract id… now when importing that data into the debt collector’s systems we knew that whatever the client used as a business key is not guaranteed to be unique across clients we manage.
  • So, we developed a debt-collection system that whatever debt that was handed over to us is given our own business key in our debt collection system, that business key is what we loaded into hub_debtor and the key we loaded there is the reliable, unique, strong business key our source system manages and the support and business staff within the debt collector’s business can relate to.
  • The account number / customer id / contract id our clients used was merely loaded to a “external_id” attribute in a satellite table.
  • Internally we refer to our debtor_id, but with our client we always discuss the debt using the external_id

You’re right, “weak” hubs are an anti-pattern.

Hope that helps,

Financial statement may be a description in satellite of a company, you are right.

I am not sure to understand how to apply your example. Our source can have really poor data quality, and (more often than not) we can’t create a BK.

Our data looks like :
firstName + lastName + Date of Birth + workplace,
firstName + lastName + “own %” + companyName,
firstName + lastName + Date of Birth + “deputy mayor” + city

Even if it’s relly unlikely, none of the lines are truly unique. We use this data to identify records that likely represent the same person, and we assign a trust score to each match.

Even with MDM, I don’t see how to store this data in a HUB in the raw vault.

Thanks for your answer, I have to learn more about MDM!!

the point of my story is… your architecture should include an application to create and govern a business key that is recognized by the business. Much wider effort than just data modelling.

MDM is used a data source for DV, match-merging and folding records into a golden record based on business rules, similar to what you described. There are four kinds of MDM, from ludicrously expensive live intrusion into source systems pushing in MDM ids to less expensive ones that passively find matched between record. MDM id can be used as a key to integration your data across source systems.

Oh ok I think I understand! Thanks for your time and your answers!