What's the best approach for removing columns from a satellite?

Hi All,

We’ve done some plugging away at our DV tech debt recently and found that there’s a metadata column that we would like to remove from a satellite, It’s not breaking anything to keep around but it’s rapidly changing in comparison to the rest of the payload so it’s increasing our deltas and has no real use case downstream so we’d like to either move it to a separate narrow satellite or just remove it until we need it.

I don’t mind keeping the column and just setting it to null for future records so it resolves the volume issue but then on our next load it pulls in a whole bunch of deltas since the hashdiff has changed, yet the effective from is the same value so we end up with a lot of duplication in our data.

I know the correct answer is obviously not to do this to begin with but we all know that tech debt is an inevitability so what’s the best way of rectifying this issue without losing the existing history in the vault?

Things I’ve considered so far:

  • Accepting my losses and resetting the history
  • Manually datafixing the satellite after a load to delete records with the same eff_date but different ldts (keeping the most recent ldts)
  • Setting the eff_date to current_date for one load so the records update appropriately and kind of represent the state of the history
  • Accepting the dupes adding a view to negate them downstream
  • Manually Updating the hashdiffs for all historic records to ignore the removed column so the next delta is computed normally, this will mess with the old data and invalidate the sats auditability

Interested to know your thoughts!

All the best,
Frankie

Hi Frankie,
It seems you got the solutions and the consequences and have to decide now…
I’d say: If this metadata column is giving you a lot of unnecessary changes anyway, set it to NULL at source. This implicitly causes an initial load without any change in your DV platform.
Going for the “update hashdiff” is necessary for the most recent record per KB/BK only. To keep auditability, add a satellite with all the BK/HKs, let’s and hashdiff you updated.
Yes, DV2 is a zero update approach, but this is an option due to costs or technical feasibility (read-only platforms) .
Another approach: Bring a new satellite to load this piece of source without the metadata column, stop loading the old one, and bring a “union” (with cutover date) to have an entire history. Benefit: Bring the new structure parallel to the old and do the cutover, if the new one is tested and approved. It’s my favorite approach to going for changes, esp. in the BV (changing business rules).
What is your platform and the volume, you’re processing? These are two of the parameters for a proper decision.
Warm regards
Volker

Hey Frankie,

Options:

  1. Huge volume: stop including it in the delta to upload, leave the column there. BUT I think you’ve pointed this out, you’ll get an ‘update’ for every record
  2. Huge volume (but cheating). If you know what the ‘active’ value for that column is, include that value in the delta – it will always match the current record because you will never change it. I don’t like this option because it introduces its own tech. debt. but you don’t get delta changes
  3. Not so bad volume. Bite the bullet, refactor and ensure each record has a new hashdiff
  4. Not so bad volume (tech debt 2). Start a new satellite without that column, stop loading the old. This has tech debt too because you still need to process the old satellite up to a certain date (in a union all) and remove duplicates every time you do (you’ll have duplicates when not including that column you just declared defunct).