Our source data frequently includes an effective date. We do include the common contact info bits like phone, address, and email. Each one of these has a specific type code and effective date that relates back to a person. Same pattern a million other databases use.
One difference, that may or may not matter, is that we’re doing a full import of the source data every day. We do not have any CDC system in place.
I have been digging in the Linstedt and Cuba books but am having a hard time pinpointing the phrasing that I can use to make my case to our standards group.
When loading the satellites, we are using the pattern of comparing every row in the staging table to only the most recent versions in the satellite partitioned by business key ranked over LoadDate. The business keys for the contact info parts is PERSON+TYPE. If any of the source rows, compared to this most-recent subset of satellite rows have a different hashdiff, they get loaded.
The problem is that the satellites are just full of duplicate data now. Every time it loads, most of the snapshot data is deemed “new”.
There was a line in Cuba’s book, (p 209, under figure 4-46) “Effectivity supplied by the source is merely hashed into a record hash (hashdiff) column and loaded like any other satellite load.”
That was already being done. I’m just missing the logic on what the books are telling us to do vs. what is intuitive in the SQL.
One of the other developers thought it just looked weird and on these source-supplied effectivity tables, he ignored the “most recent vault item” lookup completely and loaded the satellites just like a hub. It works fine.
Tonight, I changed the business key to include PERSON+TYPE+EFFDT. That works. This means, I probably could have just modified the window function to partition by PERSON,TYPE over EFFDT,LoadDate and probably got the same result. What is the common pattern with EFFDT here?
But, in the eyes of the standards group, both of these will appear to be non-compliant workarounds.
We need to document this choice and show that is part of the DV methodology. Just saying, “it works, even though every book and blog post tell us to do something else…” isn’t going to go over well.
So what am I missing?