Hi all,
I’m looking to implement my first bridge table using dbtvault. I looked up @patrickcuba pseudo code, but unfortunately, that doesn’t cover effectivity satellites.
So I have a Hub (called “hub”), with two Links (called “link_1” and “link_2”), that both have an effectivity satellite (called “eff_sat_1” and “eff_sat_2”). The eff_sat contains an applied date called “eff_dts”, with effectivity start and end dates called “eff_start_dts” and “eff_end_dts”. This results in the following generated dbtvault sql:
WITH as_of_dates AS (
SELECT *
FROM as_of_date
),
new_rows AS (
SELECT
hub.hub_hk,
as_of_dates.AS_OF_DATE,
link_1.link_hk AS link_1_hk,
eff_sat_1.eff_end_dts AS eff_sat_1_eff_end_dts,
eff_sat_1.load_dts AS eff_sat_1_load_dts,
link_2.link_hk AS link_2_hk,
eff_sat_2.eff_end_dts AS eff_sat_2_eff_end_dts,
eff_sat_2.load_dts AS eff_sat_2_load_dts
FROM hub
INNER JOIN as_of_dates
ON (1=1)
LEFT JOIN link_1
ON hub.hub_hk = link_1.foreign_hub_hk_1
INNER JOIN eff_sat_1
ON eff_sat_1.link_hk = link_1.link_hk
AND eff_sat_1.eff_dts <= b.AS_OF_DATE
LEFT JOIN link_2
ON link1.foreign_hub_hk _2 = link_2.foreign_hub_hk_1
INNER JOIN eff_sat_2
ON eff_sat_2.link_hk = link_2.link_hk
AND eff_sat_2.eff_dts <= b.AS_OF_DATE
),
all_rows AS (
SELECT * FROM new_rows
),
candidate_rows AS (
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY
AS_OF_DATE,
link_1_hk, link_2_hk
ORDER BY
eff_sat_1_load_dts DESC, eff_sat_2_load_dts DESC
) AS ROW_NUM
FROM all_rows
) AS a
WHERE a.ROW_NUM = 1
),
bridge AS (
SELECT
c.hub_hk,
c.AS_OF_DATE,
c.link_1_hk,
c.link_2_hk
FROM candidate_rows AS c
WHERE TO_DATE(c.eff_sat_1_eff_end_dts) = TO_DATE('9999-12-31 23:59:59.9999999')
AND TO_DATE(c.eff_sat_2_eff_end_dts) = TO_DATE('9999-12-31 23:59:59.9999999')
)
SELECT * FROM bridge
My questions:
- in the new_rows CTE, won’t the INNER JOIN to an eff_sat result in records falling out if that eff_sat didn’t exist at a certain as_of_date?
- in the bridge CTE, the WHERE clause on eff_end_dts = 9999… isn’t that too late? Shouldn’t that be moved up to the ON clause in the eff_sat JOIN in new_rows CTE?
- the bridge_walk is used to walk across multiple links, where fk1 in link2 is joined on fk2 in link1. What if I want to join all links on fk1 (in other words, the base hub)? For instance, I have a Hub, with a Link_Parent and Link_Ultimate_Parent that both have the same driving key. So basically, I want this in my Bridge:
Additionally, I would like to have the option to include the hub hashkeys (and hub business keys) in the final SELECT as well.
I’m going to create some explicit test cases to challenge my own questions… those will follow.
edit: below I also changed the example on Macros - dbtvault to include an additional link. Please notice the fact that in my own example, I use the “eff_dts” as as_of_date and join my eff_sats accordingly. The dbtvault example relies on the load_datetime timeline.
WITH as_of AS (
SELECT a.AS_OF_DATE
FROM AS_OF_DATE AS a
WHERE a.AS_OF_DATE <= CURRENT_DATE()
),
new_rows AS (
SELECT
a.CUSTOMER_PK,
b.AS_OF_DATE,
LINK_CUSTOMER_ORDER.CUSTOMER_ORDER_PK AS LINK_CUSTOMER_ORDER_PK,
EFF_SAT_CUSTOMER_ORDER.END_DATE AS EFF_SAT_CUSTOMER_ORDER_ENDDATE,
EFF_SAT_CUSTOMER_ORDER.LOAD_DATETIME AS EFF_SAT_CUSTOMER_ORDER_LOADDATE,
LINK_CUSTOMER_ADDRESS.CUSTOMER_ADDRESS_PK AS LINK_CUSTOMER_ADRESS_PK,
EFF_SAT_CUSTOMER_ADDRESS.END_DATE AS EFF_SAT_CUSTOMER_ADDRESS_ENDDATE,
EFF_SAT_CUSTOMER_ADDRESS.LOAD_DATETIME AS EFF_SAT_CUSTOMER_ADDRESS_LOADDATE
FROM DBTVAULT_DEV.TEST.HUB_CUSTOMER AS a
INNER JOIN AS_OF AS b
ON (1=1)
LEFT JOIN LINK_CUSTOMER_ORDER AS LINK_CUSTOMER_ORDER
ON a.CUSTOMER_PK = LINK_CUSTOMER_ORDER.CUSTOMER_FK
INNER JOIN EFF_SAT_CUSTOMER_ORDER AS EFF_SAT_CUSTOMER_ORDER
ON EFF_SAT_CUSTOMER_ORDER.CUSTOMER_ORDER_PK = LINK_CUSTOMER_ORDER.CUSTOMER_ORDER_PK
AND EFF_SAT_CUSTOMER_ORDER.LOAD_DATETIME <= b.AS_OF_DATE
LEFT JOIN LINK_CUSTOMER_ADDRESS AS LINK_CUSTOMER_ADDRESS
ON a.CUSTOMER_PK = LINK_CUSTOMER_ADDRESS.CUSTOMER_FK
INNER JOIN EFF_SAT_CUSTOMER_ADDRESS AS EFF_SAT_CUSTOMER_ADDRESS
ON EFF_SAT_CUSTOMER_ADDRESS.CUSTOMER_ADDRESS_PK = LINK_CUSTOMER_ADDRESS.CUSTOMER_ADDRESS_PK
AND EFF_SAT_CUSTOMER_ADDRESS.LOAD_DATETIME <= b.AS_OF_DATE
),
all_rows AS (
SELECT * FROM new_rows
),
candidate_rows AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY
AS_OF_DATE,
LINK_CUSTOMER_ORDER_PK,
LINK_CUSTOMER_ADDRESS_PK
ORDER BY
EFF_SAT_CUSTOMER_ORDER_LOADDATE DESC,
EFF_SAT_CUSTOMER_ADDRESS_LOADDATE DESC,
) AS row_num
FROM all_rows
QUALIFY row_num = 1
),
bridge AS (
SELECT
c.CUSTOMER_PK,
c.AS_OF_DATE,
c.LINK_CUSTOMER_ORDER_PK,
c.LINK_CUSTOMER_ADDRESS_PK
FROM candidate_rows AS c
WHERE TO_DATE(c.EFF_SAT_CUSTOMER_ORDER_ENDDATE) = TO_DATE('9999-12-31 23:59:59.999999')
AND TO_DATE(c.EFF_SAT_CUSTOMER_ADDRESS_ENDDATE) = TO_DATE('9999-12-31 23:59:59.999999')
)
SELECT * FROM bridge