Use Date as Dependent Child (Forecasts, historical data)

Hi,

are there any best practices for modelling transactions or forecast without a transaction-ID (Sales-ID, Order-ID,…) suitable as BK.

I came across some sources providing data like this (filebased or database tables):

Product Customer ForcastDate SalesAmount
P1 C1 01.01 100
P1 C1 01.02 90
P1 C1 01.03 110

Later some changes occur (Source only provides Delta):

Product Customer ForcastDate SalesAmount
P1 C1 01.03 50 (Updated the amount for 01.03)
P1 C1 01.04 200
P1 C1 01.05 250

This could be forecasts for particular years, months, dates or even historical data for a particular date/time which changes over time.

What are the best practices for this?

A) Dep Child in Link

  • h_product
  • h_customer
  • l_product_customer1
    • l_product_customer_hk
    • ldts
    • h_product_hk
    • h_customer_hk
    • forcastDate (Dep Child?)
  • sat_product_customer_…_SalesAmount
    • l_product_customer_hk
    • ldts
    • hdiff
    • SalesAmount

B) Dep Child in Sat

  • h_product
  • h_customer
  • l_product_customer
    • h_product_hk
    • h_customer_hk
  • sat_product_customer_…_SalesAmount
    • l_product_customer_hk
    • ldts
    • forcastDate (Dep Child?) as part of PK
    • hdiff
    • SalesAmount

C) Multi-Active Sat? (I dont get the full set, only delta)

I’ve read the books, searched online, attended the class but it is hard to find an (definitive) answer.

Documents about sales or transactional data utilizing a Sales-ID/Order-ID/… (so one date per ID) and attach the metrics-satellite to the hub (and date in hub-sat) - That’s the easy one :slight_smile:

Most documents and samples on dependent-child-keys are about LineItem-Numbers, Sequence-Numbers (generated or not) or codes but never on dates or similar (Month/Quarter/Year). The only hint I found, was in the CDVP2-Materials stating “timestamp of an email message” as an example for a dep-key.

Yes! Option B is the most scalable option, MSATs don’t fit here!

We use transaction date as a dep-key and call it an “intra-day key” — this approach supports idempotence and scale!

3 Likes

Hi Chris,
I am an CDVP instructor, so let me try clarify the material as best I can and offer a suggestion. The definition of a dependent child is essentially an attribute necessary for uniqueness in a relationship, but does not qualify as a business key. Thus, dependent child attributes are found in links, and used to maintain the grain of your source dataset. The dataset you have is of the grain product, customer and forecast date, so the link option does the best job of allowing recreation of source - which by extension is needed for auditability.
Having said that, you always have choices based on scalability and performance that allows for different non-standard patterns, but automation tools will likely require extension to support them - so that is a factor in your decision as well. It looks like automatedv supports this via selecting a multi-active satellite “Child Dependent Key(s) (src_cdk)”, but I have no experience using this approach.

I have no doubt either option can be made to work, and you can try each and see which is easier to load and query based on your specific use-case, data volume and automation framework.

Hope that helps.

Bruce