Hi All,
We’ve done some plugging away at our DV tech debt recently and found that there’s a metadata column that we would like to remove from a satellite, It’s not breaking anything to keep around but it’s rapidly changing in comparison to the rest of the payload so it’s increasing our deltas and has no real use case downstream so we’d like to either move it to a separate narrow satellite or just remove it until we need it.
I don’t mind keeping the column and just setting it to null for future records so it resolves the volume issue but then on our next load it pulls in a whole bunch of deltas since the hashdiff has changed, yet the effective from is the same value so we end up with a lot of duplication in our data.
I know the correct answer is obviously not to do this to begin with but we all know that tech debt is an inevitability so what’s the best way of rectifying this issue without losing the existing history in the vault?
Things I’ve considered so far:
- Accepting my losses and resetting the history
- Manually datafixing the satellite after a load to delete records with the same
eff_date
but differentldts
(keeping the most recentldts
) - Setting the eff_date to current_date for one load so the records update appropriately and kind of represent the state of the history
- Accepting the dupes adding a view to negate them downstream
- Manually Updating the hashdiffs for all historic records to ignore the removed column so the next delta is computed normally, this will mess with the old data and invalidate the sats auditability
Interested to know your thoughts!
All the best,
Frankie