Hi,
I recently started to learn AutomateDV. It works quite well, but I’m struggeling a little with specific link-satellites.
Consider a simple m:m relation coming from an intersection table:
create table "source".cse_customer_employee
(
cse_rowid serial primary key,
cse_created_at timestamp,
usr_created_by int,
cse_modified_at timestamp,
usr_modified_by int,
cse_deleted_at timestamp,
usr_deleted_by int,
-- payload
cst_customer int,
emp_employee int,
cod_role int,
cse_effective_from date,
cse_effective_to date,
cse_primary boolean
);
In DataVault, I would model this as h_customer and h_employee. These hubs are connected via l_customer_employee with its effectivity satellite s_customer_entitiy.
- there’s a technical key (cse_rowid) which I’m not going to use, since I think the “datavault-way” of doing it is to integrate using the business keys (see below)
- business keys (PK) are hk_customer, hk_employee, cod_role and cse_effective_from
- i want to include/track the payload attributes as well
First Question: Is it okay to put the payload (without the FKs) into s_customer_employee or should it go to a “normal” satellite instead? As far I know, a satellite hanging off of a link is considered an effectivity satellite, yet it’s still a satellite
In AutomateDV I picked the eff_sat macro. Basically, it works as expcected, but it won’t detect changes in payload (non-key) attributes.
{{ config(materialized='incremental') }}
{%- set source_model = "v_cse_customer_employee" -%}
{%- set src_pk = "HK_CUSTOMER_EMPLOYEE" -%}
{%- set src_dfk = "HK_CUSTOMER" -%}
{%- set src_sfk = ["HK_EMPLOYEE", "COD_ROLE", "CSE_EFFECTIVE_FROM"] -%}
{%- set src_start_date = "START_DT" -%}
{%- set src_end_date = "END_DT" -%}
{%- set src_extra_columns = ["cse_rowid", "cse_created_at", "usr_created_by", "cse_modified_at", "usr_modified_by", "cse_deleted_at", "usr_deleted_by", "cse_effective_to", "cse_primary"] -%}
{%- set src_eff = "EFFECTIVE_FROM" -%}
{%- set src_ldts = "load_ts" -%}
{%- set src_source = "RECORD_SOURCE" -%}
-- src_extra_columns added
{{ automate_dv.eff_sat(src_pk=src_pk, src_dfk=src_dfk, src_sfk=src_sfk,
src_start_date=src_start_date,
src_end_date=src_end_date,
src_extra_columns=src_extra_columns,
src_eff=src_eff, src_ldts=src_ldts,
src_source=src_source,
source_model=source_model) }}
Second question: Should I use the “normal” satellite template instead or should I take an all different approach? (I don’t think a hub “customer_employee_assigment is a good idea)
Since I included all the necessary attributes into src_dfk and src_sfk, changes in the link’s combination are detected and loaded. However, it won’t detect changes in any other fields, especially cse_effective_to.
Usually, the source system would update cse_effective_from and cse_effective_to (eg. from 2099 to something else) and set cse_modified* - so the change is detected and loaded.
However it can also just update cse_effective_to (and modified) or any other payload attribute (such as the cse_primary flag) - these changes are not detected.
As to my background: I have theoretical understanding of the DV2 modelling, since I attended that “Boot Camp” with Michael Olschimke, but that’s like 10 years since. I got back to their book (the one with the racing car) every here and then to fresh-up my knowledge. Just this week, I received Patrick Cuba’s book (data vault guru) from Amazon. I haven’t built a real Data Vault yet.