I have a source system where one of the attributes of a business
object is represented as a JSON array of strings (labels), e.g.,
[“label1”, “label2”, “label3”].
Over time (t), new labels can be added or removed from this list,
e.g.,
t0: [“label1”]
t1: [“label1”, “label2”]
t2:
t3: [“label3”]
I think that this is a typical example, where multi-active satellite
could be used. This is very similar to the “phone numbers” example
from the book or DV workshop. There’s also a good explanation from
Scalefree [1] how one could model it in more detail.
My question is, what’s the best way to model the “transition” to the
empty list at t2 in my example above. All external examples
demonstrate that changing a set of labels causes an insert of multiple
rows into multi-active satellite. However, if the list becomes empty,
there’s nothing to be inserted. Do you think that inserting a row
with a NULL label value would be appropriate and conforming to DV
standard?
PS I’ve considered alternative approaches, such as “weaker hub” with a
link and effectivity satellite, but they don’t apply very well,
because the example I’ve presented here is just a simplified version
of a more complicated “payload.”
I don’t know if this is the best way to do it but this is how I solve these type of issues. First of all we have to think a little about what a delta satellite is and what a record tracking satellite is. If data changes are tracked in a delta satellite and if existance of a key or relations of keys are tracked in a record tracking satellite then there will be no need to “tell” the delta satellite if a key stops to exist.
So, in your case, day2 wont add any change to the multi active satellite. But on day2, a new row in a RTS will be added telling the system that for that specific key there is no longer a list of labels.
On day3 a another row will be inserted into the RTS saying that a list for the specific key is from now on present again. Also the change between t1 och t3 will be inserted into the multi active satellite.
Hope your follow how I am thinking?
PS: There is also a way to skip the multi active satellite by increasing the grain of the link-key (Adding a counter to the labels in the stage or if there is a key present in the dataset).
However, there’s an edge case where both answers are “yes” (hashdiff changed? and number of records changed?), namely when there is a transition to an empty set at t2:
They way I understand is that what you are suggesting is a regular Satellite (where PK of the satellite is defined by HK of parent & LDTS) with a twist that the attributes are stored in a semi-structured field (denormalized form), e.g. PostgreSQL array.