Effectivity Satellite from Intersection Table with Payload

Hi,

I recently started to learn AutomateDV. It works quite well, but I’m struggeling a little with specific link-satellites.

Consider a simple m:m relation coming from an intersection table:

create table "source".cse_customer_employee
(
	cse_rowid serial primary key,
	cse_created_at timestamp,
	usr_created_by int,
	cse_modified_at timestamp,
	usr_modified_by int,
	cse_deleted_at timestamp,
	usr_deleted_by int,
	-- payload
	cst_customer int,
	emp_employee int,
	cod_role int,
	cse_effective_from date,
	cse_effective_to date, 
	cse_primary boolean
);

In DataVault, I would model this as h_customer and h_employee. These hubs are connected via l_customer_employee with its effectivity satellite s_customer_entitiy.

  • there’s a technical key (cse_rowid) which I’m not going to use, since I think the “datavault-way” of doing it is to integrate using the business keys (see below)
  • business keys (PK) are hk_customer, hk_employee, cod_role and cse_effective_from
  • i want to include/track the payload attributes as well

First Question: Is it okay to put the payload (without the FKs) into s_customer_employee or should it go to a “normal” satellite instead? As far I know, a satellite hanging off of a link is considered an effectivity satellite, yet it’s still a satellite :wink:

In AutomateDV I picked the eff_sat macro. Basically, it works as expcected, but it won’t detect changes in payload (non-key) attributes.

{{ config(materialized='incremental')  }}

{%- set source_model = "v_cse_customer_employee" -%}
{%- set src_pk = "HK_CUSTOMER_EMPLOYEE" -%}
{%- set src_dfk = "HK_CUSTOMER" -%}
{%- set src_sfk = ["HK_EMPLOYEE", "COD_ROLE", "CSE_EFFECTIVE_FROM"] -%}
{%- set src_start_date = "START_DT" -%}
{%- set src_end_date = "END_DT" -%}
{%- set src_extra_columns = ["cse_rowid", "cse_created_at", "usr_created_by", "cse_modified_at", "usr_modified_by", "cse_deleted_at", "usr_deleted_by", "cse_effective_to", "cse_primary"] -%}

{%- set src_eff = "EFFECTIVE_FROM" -%}
{%- set src_ldts = "load_ts" -%}
{%- set src_source = "RECORD_SOURCE"  -%}

-- src_extra_columns added
{{ automate_dv.eff_sat(src_pk=src_pk, src_dfk=src_dfk, src_sfk=src_sfk,
                       src_start_date=src_start_date, 
                       src_end_date=src_end_date,
                       src_extra_columns=src_extra_columns,
                       src_eff=src_eff, src_ldts=src_ldts, 
                       src_source=src_source,
                       source_model=source_model) }}

Second question: Should I use the “normal” satellite template instead or should I take an all different approach? (I don’t think a hub “customer_employee_assigment is a good idea)

Since I included all the necessary attributes into src_dfk and src_sfk, changes in the link’s combination are detected and loaded. However, it won’t detect changes in any other fields, especially cse_effective_to.

Usually, the source system would update cse_effective_from and cse_effective_to (eg. from 2099 to something else) and set cse_modified* - so the change is detected and loaded.

However it can also just update cse_effective_to (and modified) or any other payload attribute (such as the cse_primary flag) - these changes are not detected.

As to my background: I have theoretical understanding of the DV2 modelling, since I attended that “Boot Camp” with Michael Olschimke, but that’s like 10 years since. I got back to their book (the one with the racing car) every here and then to fresh-up my knowledge. Just this week, I received Patrick Cuba’s book (data vault guru) from Amazon. I haven’t built a real Data Vault yet.