Modeling Issues due to data quality

Hello all,

I work in the healthcare industry so our 3 data domains are Providers, Claims, and Members.

Our Member model built out in DV is actually really good. But, I’m having issues with the Provider model.

Current Business Concepts for Provider we have:
Individuals - BK is NPI
Provider Organizations - BK is NPI + TaxId
Provider Organization Locations - BK is NPI + Provider Medicaid + Zip Code

So, every individual needs to be identified by an NPI, its sort of like their Social Security #. So, we are good on that front. The real issue is our Provider Organizations / Locations.

Ideally, an organization should have both NPI/Tax_Id. However, this is not always true in our source systems. For example, we have providers that only have the NPI, but no TaxId. Or only have the TaxId but no NPI. So, when I create the BKs for an organization, I allow these combinations to flow into the HUB… since i’m staying factual to the source systems.

I’m wondering if this is the correct way to go about it?

We are also in the process of changing to a new claims system, and all our providers in there must have NPI/Tax_Id. Which, will really help improve the model.

An example I wanted to talk about is where our main source system has only NPI for a provider record, but our secondary system has the NPI & TaxId. Currently, I bring both records into the HUB as 2 separate records. And then, I create a SAL in the business layer to link them together. Is this the correct way to do this?

NPI: 100 / Tax_Id: ’ ’ (source A)
NPI: 100 / Tax_Id: 123 (source B)

In the SAL, the provider record without the TaxId will become associated to the record with the TaxId.

I think there is enough in this topic so far, so i wont talk about the location stuff just yet.

Anyone have any input? We are also trying to get our main system cleaned up, so records without the TaxId they will go back in and input a value.

Thanks in advance,

Hey Josh,

What is the TAX_ID on its own? Can the Provider-Org be identified with Tax_ID alone?
Is the tax_id a dependent-child key? To identify a Provider Org relies on NPI + TaxID or is there another unique identifier?

By looking at what you have written, an NPI identifies an individual, how/why would NPI+TaxID also be used to identify a ProviderOrg?

From what I am reading it looks like you have one hub, hub_individual and provider org and provider org location look like satellites to that single hub. If this is correct then do not treat NPI+TaxID as a business key because it isn’t, TaxID may merely be a dependent-child key to the hub. Therefore you would not care if TaxID is null or not.

I hope I interpreted your description correctly.

Patrick’s question is apt. Selecting the right business key is … well … key. :wink:

If you don’t put enough in the key, then it will not uniquely identify each entity in the domain. But putting too much in the key can also cause problems. If I were to choose a BK of [Tax ID] + [Full Name] as the BK for a Person hub, then every variation in the spelling of the Full Name (“Joseph Baker”, “Joe Baker”, “Joseph A. Baker”) would generate a separate hub row, even if the same Tax ID was associated with each name.

Hi, I also worked in dental insurance, in the provider area where there were 4 different databases, 2 internally owned and global for all USA but for which was hosted in my company.
So i remember well all sorts of incompleteness of keys…

And you right the 3 cores concepts must exist as hubs.
On my case under the +300 tables the locations model was basicall a link table called npc:
npc= (network,provider,clinic) with star/end dates, status etc.

Then we had sublinks:

And ofc claims had (n,p,c) fk and person fk

Clinic like you mentioned, could range from big organization i to just one dentist (provider), so all the possibilities on the pk (npi+tax, tax or npi)
I would say to keep all providers in just one hub, no need to add a SAL , just use delimiter when hashsing stuff:
provider_hub.provider_id = hash(Npi || ‘|’ || tax_id)

Updates after checking my notes:

  • provider_hub will potentially come from different systems, so i would imagine making sense to have one SAT per source system.

For clinic_assoc_patient probably a link table (clinic, person_subscriber) and have dependents persons on SAT.

One of the global provider databases even had 3 levels hierarchy for providers/clinics:
Business _npi - office_npi - provider
Each level having fks from upper level

Should be fun modelling in DV all the system, i had lots of fun and worked in all areas of dental insurance and yes real world incomplete bks or even source database models (tree top down vs spaghetti flat adds some fun, but i was able to create new model and have all systems daily sync with new set of core tables where i resolved all/any differences.
This was ahead of data vault, but ideas were there like using journal tables in same way as DataVault SATlites.