I’m hoping someone can please show me a cleaner way to do this. I want to create a type 1 dimension based off a hub, two satellites (for systems A and B) and a same-as-link. Not all records in system A exist in system B, and vice versa. Additionally, not all records in the hub may exist in either system A or system B. But I want my dimension to be comprehensive.
This is the code I’ve come up with, and it works, but I don’t think it’s the ‘Data Vault way’. I’ve done the course, and read the books. What have I missed?
WITH
sat_sf AS (
SELECT --Get the most recent version of bank from the Salesforce Satellite, including the business key from the hub
hub.funder_bk,
hub.funder_pk,
sat.load_date,
sat.name
FROM
HUB_FUNDER AS hub
INNER JOIN SAT_H_FUNDER__SF_BANK AS sat
ON hub.funder_pk = sat.funder_pk
QUALIFY
ROW_NUMBER() OVER (PARTITION BY sat.funder_pk ORDER BY sat.load_date DESC) = 1
),
sat_platform AS (
SELECT --Get the most recent version of bank from the Platform Satellite, including the business key from the hub
hub.funder_bk,
hub.funder_pk,
sat.load_date,
sat.funder_name AS name
FROM
HUB_FUNDER AS hub
INNER JOIN SAT_H_FUNDER__PLATFORM_BANK AS sat
ON hub.funder_pk = sat.funder_pk
QUALIFY
ROW_NUMBER() OVER (PARTITION BY sat.funder_pk ORDER BY sat.load_date DESC) = 1
),
union_set AS ( -- get a full set of all keys, but only one record per funder, even if in multiple systems
SELECT -- get the primary key for all Salesforce records that don't exist in the SAL
NULL AS platform_pk,
funder_pk AS sf_pk,
NULL AS other_pk,
NULL AS other_bk
FROM
sat_lgsf
WHERE
funder_pk NOT IN (SELECT funder_pk__sf_bank FROM SAL_FUNDER__SF_BANK__PLATFORM_FUNDER)
UNION ALL
SELECT -- get the primary key for all Platform records that don't exist in the SAL
funder_pk AS platform_pk,
NULL AS sf_pk,
NULL AS other_pk,
NULL AS other_bk
FROM
sat_platform
WHERE
funder_pk NOT IN (SELECT funder_pk__platform_funder FROM SAL_FUNDER__SF_BANK__PLATFORM_FUNDER)
UNION ALL
SELECT -- Get the primary keys for records that do exist in the SAL
funder_pk__platform_funder AS platform_pk,
funder_pk__sf_bank AS sf_pk,
NULL AS other_pk,
NULL AS other_bk
FROM
SAL_FUNDER__SF_BANK__PLATFORM_FUNDER
UNION ALL
SELECT -- finally add in any record in the hub that hasn't already been taken account of
NULL AS platform_pk,
NULL AS sf_pk,
funder_pk AS other_pk,
funder_bk AS other_bk
FROM
hub_funder
WHERE
funder_pk NOT IN (SELECT funder_pk FROM sat_sf)
AND funder_pk NOT IN (SELECT funder_pk FROM sat_platform)
)
SELECT -- bring it all together with a COALESCE to choose which system to use as a primary source
COALESCE(union_set.platform_pk, union_set.sf_pk, union_set.other_pk) AS _dim_pk,
COALESCE(sat_sf.funder_bk, '') AS sf_funder,
COALESCE(sat_platform.funder_bk, '') AS platform_funder,
COALESCE(union_set.other_bk, '') AS other_funder,
COALESCE(sat_sf.name, sat_platform.name, '') AS funder_name
FROM
union_set
LEFT JOIN sat_sf
ON union_set.sf_pk = sat_sf.funder_pk
LEFT JOIN sat_platform
ON union_set.platform_pk = sat_platform.funder_pk
Based on the above, if you were to have multiple SAL tables for the same object (system A to system B, and another for system B to system C) then you could join these together in a cte it the top, and then reference it as needed in the union_set.
EDIT: For context on the naming:
SAT_H_FUNDER__SF_BANK as a table name infers that it is a Satellite table, that hangs off Hub Funder, and is sourced from the Bank table in Salesforce.
SAL_FUNDER__SF_BANK__PLATFORM_FUNDER as a table name infers that it is a Same-as-Link, that is related to Hub Funder, and provides the relationship between the Banks in Salesforce and the Funders in Platform.
They may not be perfect, but they work for us.