Technical Advice on SAL and PIT Related to Same Hub

Thank you Patrick,

Sorry for the delayed response. I’ve been looking after my sick son.

Anyway…

If I change the name from SAL_HUB_CUSTOMER_HK to simply SAL_CUSTOMER_HK is that more agreeable?

I understand the “eventually consistent” approach, however some keys will never have same as equivalents from another system. Given I’ll be creating a PIT off the back of the SAL, I’m either going to need to bring all keys into my SAL and make building the PIT easier, else I need to accommodate for it in the logic to build the PIT (to take a quote from your book - you need to pay the piper at some point). Is there inherent value in doing this logic at the later stage (finding all of the customer keys that aren’t in the SAL and adding the logic into the PIT as opposed to ensuring the SAL has all possible keys)?

I’m going off Dan’s book and yours. Neither really discuss how to handle a PIT on top of a SAL where a SAL doesn’t contain every key found in the associated HUB. So I’d taken from that it should, hence my approach.

SAL is not a HUB — so yea it will be Vault-like. It is important to keep these things the way they are. Sounds fickle but that’s the difference between discipline and chaos.

Shouldn’t have to — a PIT over a Link table is what a SAL is. Recall that a PIT can be based on either, just like STS and RTS. Hence we know a SAL is a Link then what is stopping you from using it as such?

Thank you!

Yep - I get that it’s a link, but without adding all the ‘non-related’ keys in, it’s an incomplete link. Example:

Customer A exists only in ‘Platform1’. As such, the mapping table that is used as the basis of the SAL has no record of this customer.

If I simply use the mapping table as the sole basis for the SAL, and then create a PIT over the top of this SAL table there is no reference to Customer A, hence Customer A would not appear in the PIT. The PIT is then used as the basis for the SCD2 Customer Dimension, so Customer A doesn’t appear in dim_customer, and the business complains.

If however I use the mapping table as the base of the SAL (rather than the sole input), then I can augment it with UNION DISTINCT of all sources of Customer keys (much the same as we do with the HUB tables to ensure completeness). Even if these appear in the SAL with only one column (with no ‘same as’ relationship), then at least I know that the SAL is complete. I can then base my PIT on the complete SAL, knowing all customers are accounted for. Further, I can then be assured that my dim_customer will contain all customers, whether they exist in one source system or many.

I think this is basically a question of ‘at what point do you do the work to ensure that you have got all customers from all sources, where the same customer can exist in multiple sources with different id’s and so a SAL is involved, and you want point in time to create an SCD2’:

  • when building the SAL, ensure that all customers are in it, even if they don’t have a same-as relationship;
  • when building the PIT off the SAL, which would mean that you don’t just build the PIT off the SAL, but you build the PIT off the SAL plus all sources of a customer, to ensure you have picked up everything; or
  • not in the DV at all, but in the Kimball model (which would be far more complex given you already have a PIT with 90% of the customers already neatly time-seriesed across multiple systems).

Have you tried building a Business Vault Link?

You can have many raw vault SALs and resolve the structures into a single BV link (to rule them all)