Immutable Business Key for Customer Contact

Hi all,

I’m new to DVUG, but have been lurking these pages trying to expand my knowledge of DV2.0 beyond the basics, and to decide if DV is the right way to go for a new data platform I’m designing.

I’ve read a few similar posts, most recently Modelling Question on Business Key Updating in source system, which led me to Seven Deadly Sins of Fake Vault. The term “Fake Vault” was coined at… | by Patrick Cuba | The Modern Scientist | Medium and You might be doing #datavault Wrong! | by Patrick Cuba | Medium. I think I’m in very real danger of breaking one or several deadly sins, but I’m still drawn to the flexibility DV appears to offer when integrating new sources, as my solution will need to scale across multiple third-party platforms over time.

My immediate concern relates to customer contact data from multiple third-party platforms (e.g. linkedin, mailchimp). Identifying a single, immutable BK for all sources feels impossible. The business would recognise firstname, lastname, and email_address as the identifiers for many of these source systems, but these can obviously change over time. I could adopt the contact_id from these platforms, with an appropriate BKCC, but that feels like I’m in danger of creating a source system data vault. Also, not all sources have a unique id outside of email_address (for these platforms a new email address == a new person).

A lot of my perceived problems revolve around customer data, and I suspect the right approach in relation to MDM would be get this data into the CRM, and let that handle deduplication, however this approach has been vetoed. I recognise DV isn’t going to magically solve data governance and data quality issues - it may be that given the lack of cleanliness and control over the sources, that DV isn’t the right approach, but I’d love to hear thoughts from people who may have tried to apply DV to this type of challenge.

General advice welcomed, but to make it easier to respond, I think my question boils down which of these is the least bad idea:
a) Create HUB_customer_contact with BKs as a composite of firstname, lastname and email_address, with a Same-As-Link table built from the source contact_id where it exists;
b) Create HUB_customer_contact with BK as the contact_id from the source platforms where it exists, the email_address where no other unique id exists, a BKCC where required, a SAL table for deduplication, and hold a “bag of keys” satellite (Patrick warns against this in the 7 deadly sins post above);
c) Dispense with the DV2.0 model and go with something less-principled.

Thanks for reading.

Jeff

You’re right, data vault isn’t going to magically solve data governance and so on.

if contact_email is an unique key in its own right, is used in multiple channels and you have approval from the business on this, then stick with this as a business key. Keep the ID key in the satellite. In reality, like you say, most systems and channels just say email = person, even if this is not strictly true.

You understand it’s not perfect ( a person <> an email address despite what the systems think) but it may be good enough to start seeing value. You’d build a satellite for each system with its own information, and then agree a ranking on which information source takes priority.

I think the fun starts then with as you build up other information into a potential same as link.

  1. you obviously need to be flagging all this info as PII.
  2. you can create a little app on the satellite data to match / merge and create a BV SAL
  3. You can also use the data from the satellites to create a more integrated satellite but recognise that you need to be careful where this is used downstream.
1 Like

Agree with U, Nat.

I would add fuzzy matching logic to match Satellite (source) data in the BV_SAL_CUSTOMER to resolve whether they belong to the same Customer with a Confidence-level, i.e. we can determine that two source system Satellite records are for the same customer with, say 95+ % Confidence, even though there may be differences in First-Name or Family_Name or even Email address.
Other attributes that can be used in the Fuzzy Match are Date_of_Birth/ Tax_File_Number/Passport #/etc
if available.

Comments welcome.

1 Like