Code template to use effectivity satellites for multiple sources

Hi,

We have stumbled upon following challange:

  1. We have identified multiple links, where there is a driving key and relation to other hubs can change over time.
  2. We have introduced multiple effectivity satellite in order to track the relations for each source of the link.
  3. We want now to use the effectivity satellites in the code of bridge tables, however we have not found a straightforward code pattern and used the following code where we union all the effective satellites for given link.

INSERT INTO BRIDGE (hk_hub1, hk_hub2, …, snapshot_date)

SELECT

FROM
as_of join hub1 on 1=1
INNER JOIN link1 on hub1.hk_hub1 = link1.hk_hub1
INNER JOIN
(
– select active links from source1
SELECT hk_link1
FROM effSat1_current – current view on top of effectivity satellite
WHERE snapshot_date>= dv_start_date and snapshot_date<dv_end_date
and hk_link1 <> 0x0 – eliminate ghost record
UNION ALL
– select active links from source2
SELECT hk_link1
FROM effSat2_current
WHERE snapshot_date>= dv_start_date and snapshot_date<dv_end_date
and hk_link1 <> 0x0 – eliminate ghost record
) eff_sats on link1.hk_link1 = eff_sats.hk_link1

Is there a better code pattern for applying effectivity logic for link with multiple sources?
Or maybe you would use one effectivity satellite for link and don’t dinstinguish the sources?

Regards,
Marcin

Hello,

Not sure of the difficulty in this situation since you’d just want to select content between the start and end date of the effectivity satellite based on the driving key, right?

Pretty standard, I cover 3 ways to query an EFS, simply combine the method with a bridge and voila.

Hi Patrick,

The difficulty lies in the fact that link has multiple source, thus we need to introduce multiple satellites to control respective scopes of data. Have a look at the image below:

  • effSat1 tracks link entires from source S1, whereas effSat2 tracks source S2.

obraz

I can think off 3 options:

  1. Create only 1 effectivity satellite - pretty difficult, since keys would have to be tracked on source by source basis anyway
  2. UNION HKs from eff sats and then join to link (like I did above)
  3. Move UNION to upper level, each union would join to 1 eff satellite only, see code below

What do you think about these options?

INSERT INTO BRIDGE (hk_hub1, hk_hub2, …, snapshot_date)
…
SELECT ...

FROM
as_of join hub1 on 1=1
INNER JOIN link1 on hub1.hk_hub1 = link1.hk_hub1
INNER JOIN
(
	– select active links from source1
	SELECT hk_link1
	FROM effSat1_current – current view on top of effectivity satellite
	WHERE snapshot_date>= dv_start_date and snapshot_date<dv_end_date
	and hk_link1 <> 0x0 – eliminate ghost record
) eff_sats on link1.hk_link1 = eff_sats.hk_link1

UNION ALL

SELECT ...

FROM
as_of join hub1 on 1=1
INNER JOIN link1 on hub1.hk_hub1 = link1.hk_hub1
INNER JOIN
(
	– select active links from source2
	SELECT hk_link1
	FROM effSat2_current
	WHERE snapshot_date>= dv_start_date and snapshot_date<dv_end_date
	and hk_link1 <> 0x0 – eliminate ghost record
) eff_sats on link1.hk_link1 = eff_sats.hk_link1

Ah… context!

You’d need a bridge to pass through multiple the boundaries.

For the same relationship from multiple sources it would imply an equi-join from link to multiple efs and that would work just fine, the pointy problem are for those relationships not coming from all sources, they would not have an entry in an EFS and force you into a LEFT join for those. Gets ugly.

BUT a more pertinent observation

  • Why did you model all these into a single link? This implies that you are mapping redundant business processes into a DV. You may have driven yourself into a corner by introducing this complexity yourself!
  • Why have so many EFS in the first place? Does the source not provide an effectivity date and therefore you would not need EFS at all
  • Can EFS be substituted by RTS or STS?

you’re right, context is king, forgot to mentions few details:

  • DWH aggregates data from multiple tenants, each tenant provides it’s own interfaces, therefore multiple satellites, but common links (common process)
  • we have to track validity of assignments (e.g. Offer->Contract), which may change over time, unfortunately sources don’t provide timestamp information about when the assignment changed, therefore we use EFS pattern to calculate it

Gotcha

You know, you can use an RTS on a link (or XTS), probably easier to manage what you have that way you are guaranteed to find a match using an EQUI-JOIN.
Single RTS on a link tracks the occurrence of the relationship, when you query using a driving key you can infer the latest RTS occurrence and then determine the current relationship there.

In the absence of the applied timestamp (it’s very rare case we get reliable appldts from source) RTS seems like an overkill, since there is not a good ChangeHash to compare against and RTS would be flooded with all records every day…
EFS seems like an effective template to use, only the use of it in BV seems more complicated.

Feels like it

I’ve always determined that EFS can be both an RV and a BV; to that end, either

  • You need to have multiple link tables and therefore attain distinctive relationships from each source and associated EFS, using a bridge to sow this together nicely or

  • You’d need to construct a BV EFS that consolidates the the EFS and makes that easier for downstream consumption

Definitely I would not use multiple link tables, it’s all about linking the same concepts, only sourced from different sources (as per tenant).
BV EFS would be also quite complex, given I’d have to UNION all the link stages, therefore loosing the parallelism.
When I think of it further, I could split loading the Bridge on tenant/source by tenant/source basis, the caviot would be the eventual concistency of the bridge as well, not sure if it’s optimal.