How to model an empty list in multi-active satellite?

Hi all,

I have a source system where one of the attributes of a business
object is represented as a JSON array of strings (labels), e.g.,
[“label1”, “label2”, “label3”].

Over time (t), new labels can be added or removed from this list,
e.g.,

  • t0: [“label1”]
  • t1: [“label1”, “label2”]
  • t2:
  • t3: [“label3”]

I think that this is a typical example, where multi-active satellite
could be used. This is very similar to the “phone numbers” example
from the book or DV workshop. There’s also a good explanation from
Scalefree [1] how one could model it in more detail.

My question is, what’s the best way to model the “transition” to the
empty list at t2 in my example above. All external examples
demonstrate that changing a set of labels causes an insert of multiple
rows into multi-active satellite. However, if the list becomes empty,
there’s nothing to be inserted. Do you think that inserting a row
with a NULL label value would be appropriate and conforming to DV
standard?

[1] https://www.scalefree.com/scalefree-newsletter/using-multi-active-satellites-the-correct-way-2-2/

PS I’ve considered alternative approaches, such as “weaker hub” with a
link and effectivity satellite, but they don’t apply very well,
because the example I’ve presented here is just a simplified version
of a more complicated “payload.”

I don’t know if this is the best way to do it but this is how I solve these type of issues. First of all we have to think a little about what a delta satellite is and what a record tracking satellite is. If data changes are tracked in a delta satellite and if existance of a key or relations of keys are tracked in a record tracking satellite then there will be no need to “tell” the delta satellite if a key stops to exist.
So, in your case, day2 wont add any change to the multi active satellite. But on day2, a new row in a RTS will be added telling the system that for that specific key there is no longer a list of labels.
On day3 a another row will be inserted into the RTS saying that a list for the specific key is from now on present again. Also the change between t1 och t3 will be inserted into the multi active satellite.
Hope your follow how I am thinking?

PS: There is also a way to skip the multi active satellite by increasing the grain of the link-key (Adding a counter to the labels in the stage or if there is a key present in the dataset).

Hope this helps

Hello,

In DV2.0 there is no such thing as a “Weak Hub”; that is an ensemble data vault concept

Assuming you’re getting multiple records and you want to treat them as a SET then MSAT would be ideal because an MSAT looks at two things:

  • Do any of the HashDiffs in the SET differ?
  • Do the number of records in the SET differ?

If either of the above are true then insert the delta.

Hi Patrick,

Yes, this is a standard MSAT loading pattersn!

However, there’s an edge case where both answers are “yes” (hashdiff changed? and number of records changed?), namely when there is a transition to an empty set at t2:

Does it result in MSAT like this?


| ldts | seq | label    |
|------+-----+----------|
| t0   |   1 | 'label1' |
| t1   |   1 | 'label1' |
| t1   |   2 | 'label2' |
| t2   |   1 | NULL     |
| t3   |   1 | 'label3' |

label is a list, semi-structured column
I’d expect:

| ldts | seq | label |
|------±----±---------|
| t0 | 1 | ‘label1’ |
| t1 | 2 | ‘label1’, ‘label2’ |
| t2 | 3 | NULL |
| t3 | 4 | ‘label3’ |

no?

MSAT condition is disjunctive (OR)

Hey Patrick,

They way I understand is that what you are suggesting is a regular Satellite (where PK of the satellite is defined by HK of parent & LDTS) with a twist that the attributes are stored in a semi-structured field (denormalized form), e.g. PostgreSQL array.

What I’m trying to do is a standard MSAT where for a given parent HK, there can be multiple records with the same load timestamp (multi-active). PK of the MSAT satellite is made of HK of the parent, LDTS, sub-seq (https://www.scalefree.com/scalefree-newsletter/using-multi-active-satellites-the-correct-way-2-2/).

My question is really about the standard DV2.0 MSAT and what do load if the source system has no active records for a certain time range.

Thanks for an advice,
Marek

No, I did not, the structure I proposed is an MSAT. No twist, it’s the standard pattern.

1 Like

Thank you for clarification. I’ll add this structure to my DV toolset.