Modeling question on Child Key on Data Vault 2.0

Hello All,

Can you please share your thoughts about dependent key and modeling question in DV 2.0

We have source table about customer and address details. One customer can have multiple addresses.

Background information about source data
Customer ID
100 - Customer A
200 - Customer B
300 - Customer C

Address Type
1 - denotes default billing
2 - denotes shipping address
3 - transaction
4 - locator address

Address Code
Auto incremental number per customer
Customer A - 1, Customer A - 2, Customer A -3
Customer B - 1, Customer B - 2, Customer B - 3, Customer B - 4

Note: One customer can have multiple shipping addresses

Source Customer Address table structure sample:

Customer ID | Address Code | Address Type | address Line 1
100 | 1 | 1 | george st, sydney
100 | 2 | 2 | new castle
100 | 3 | 2 | melbourne
200 | 1 | 1 | london
200 | 2 | 2 | new york
300 | 1 | 1 | brisbane

My initial go to approach to model this scenario:

Hub_Customer
Sat_Customer

Hub_AddressType
Sat_AddressType

Link_Customer_Address

L_Hkey | Hub_customer_Hkey | Hub_AddressType_Hkey | Address Code (dependent key)

Address Code in Link table:
Treating Address Code as a dependent key, since it is auto incremental and have no meaning in business process, but this helps mapping between customer and address type (customer may have many address types (shipping, billing, etc))

Link_SAT_Customer_Address
L_Hkey | Address Line 1 |Address Line 2 | Load datetime |rec source

Question: Do you think this is good approach ? any comments / inputs ?
Appreciated your inputs.

regards,
Rudra

Hi Rudra,

AddressType to me is not a Hub. I would model this as one Satellite with a DCK:

Sat_Customer_Address:
Hub_Customer_Hkey - Hashkey
AddressCode - DCK
AddressType - Attribute
AddressLine1 - Attribute

In my current implementation, I actually never use the Satellite+DCK pattern, but always the Multi-Active Satellite pattern. Mainly because in that pattern, you can deal with DCKs that get deleted (which happens a lot in my sources which all deliver full snapshots). In the case of the Satellite+DCK pattern, you will only track changes per DCK. So if this:
100 | 1 | 1 | george st, sydney
changes to:
100 | 1 | 1 | somewhere else
you will track that in the satellite. But what happens when someone changes their default billing address? Will only AddressLine1 change (like in my example)? Or will the source insert a new record:
100 | 4 | 1 | somewhere else
and stop delivering the original record?
In that case you will now have two active default billing records for this customer and you will probably need to create a business rule to derive what is the valid default one (most likely the one with the latest applied date)

I’d actually like to hear @patrickcuba 's opinion on this: how do you cope with hard-deleted DCK records?

2 Likes

I’m not sure that address type is a separate concept here.

I imagine that for defaults the same address could be both a bill to and a ship to, or just one, and it there may be multiple bill to / ship tos for each customer. (and there are probably other types too)

So Address would be its own thing as far I can tell so you’d have

  • hub + sat customer,
  • link + sat customer address with your types in it as a’default’ address type (otherwise you’ll have duplicate the address in the hub even if the address is playing multiple roles)
    (Also a couple of order-bill to/ship to address links with the actual bill to/ship to from the order would probably be needed for when your bill to or ship to changes from the default
  • hub + sat address which is a specific location. address type then doesn’t form part of the sat_hashdiff for address

Hi @Nat

Thank you for your response.
I could not follow your second point link + sat. can you pls elaborate more.

hub + sat address -
What would be the business key on Hub_Address? Address ID is auto incremented as per customer.?

This is my understanding based on your reply.

Appreciated your inputs
Cheers, Rudra

Hi @FrenkLoonen

Thank you for your inputs.

One customer can have only one billing address any given scenario. New billing address will be replaced with old one. Do not have to maintain old address details.

But customer may have more than one shipping address and all are valid.

Cheers, Rudra

I think I’d make a composite or concatenated key between customer ID and address ID so that there is a unique key for each address (one customer has many addresses) if it is just autoincremented for each customer. Sometimes you need to create a real PK. I know some people don’t like it but it’s reality.

This gives the address hub e.g. 100_1 as the primary key, and the sat has the address info. Link customer / address has e.g. 100, 100_1 as the keys, You don’t need the address details in the link sat, just the address type.

Hi @Nat Thank you. So you’re proposing BK construction - concatenated by customer ID and address ID in the Hub_Address table ? I’m thinking about downstream implications or data consumption at BDV !

basically yes, or, you could also have a multiple attribute key including the customer number and the address code in the hub.

if you are going to concatenate, best would be that the source does this for you, or your extract / ingest does it, or you do it. You probably won’t get the source to change it though.

no — no composite key here — the dependent child key approach is the correct approach

@rudra having the dep-key in the sat means you will have only two artefacts in your model

hub ← actuall business object
sat_address ← modelled with either a sat+dep-key OR MSAT

K.I.S.S

So @patrickcuba you’re proposing this model i.e. satellite with dependent child key? is this correct ?

1 Like

@patrickcuba Thank you so much. This is really helpful.

Ok, so I hadnt really understood dependent child keys before this - so it’s at a lower/almost link level granularity than the hub.
This version makes sense - I think I didn’t like the ‘item code’ example I’ve seen for it as item/product is nearly always a first class business concept and not a dependent/child