HUB or LNK/DC when unit of work is different for source files for the same BK

Houghster · 9 June 2023 10:10

Hi all, I have a issue getting my head around what seems like a simple model in terms of whether an object should be a HUB or a LNK with a DC.

To set a simple picture, we digest transactional data from a CDC and our DV model nicely adds these source files to business object hubs. All well and good. But in the case where you are dealing with transactional entities where the source files are split between headers and lines and also have additional extension files that target either the header or line AND arrive separately, the waters start to muddy.

So, for an order hub we have a main header file which provides the HUB key and a nice set of other keys to provide a unit of work for the LNK and the rest are in the SAT, supporting files attach nicely as additional SATs to the HUB, all well and good.

The lines however get a bit messy as I wanted to avoid having a HUB for the lines and have it as a LNK with the order line being the DC. Also at this level are a number of keys to encompass the unit of work at the line level. The issue comes in providing the SATs for the extension files that are also at line level as they do not contain the unit of work from the main line level file, so you cannot hang it from the LNK key as it cannot be derived in staging, it only lives at the PK of the line itself. Forcing you to make the line level data a HUB.

In a similar vein having the order/order line unit of work in one LNK would be the best but due to the grain and separate arrival times I cannot - and I push this responsibility to the BV as a view.

I am desperately trying to not create a source driven DV, but I also cannot wait to combine these files in staging as you may get headers with no lines and vice versa adding a layer of dependency when none may exist.

Anyone faced this dilemma? Thoughts would be appreciated.

patrickcuba · 14 June 2023 02:40

Do you have a diagram of your model with some sample data?

Hard to follow, order lines are definitely not a hub, but also consider that you could send your dependent-child key to the link-satellite instead

Houghster · 18 June 2023 13:01

Hi Patrick,

Apologies for the tardy response

I certainly do have a model!

orders:

Source files are fed from system tables and we have 4 source files for the order header and 3 for the order line. The PK business keys are present in all source files. It is the link keys where things get muddy.

For the header, life is easy the unit of work is present in the main header source and supporting SATs only need to represent at the header BK and are not participants in the header LNK (but benefit from it).

With the order lines though, moving the order line field as a DC of the LNK obviously removes the need for the hub. But!

SAT_ORDER_LINE_AULT2F3_OEP55 contains
BK->ORDER_LINE
FK->ORDER
FK->PRODUCT
FK->STOCKROOM

whereas!

SAT_ORDER_LINE_AULT2F3_OEAF only contains
BK->ORDER_LINE
FK->ORDER

no product or stockroom keys in the supporting source file and I can’t infer them unless I join to the other sat which gives a dependency so not an option. If a supporting SAT arrives before the feeder SAT then I would end up with a LNK entry with just the business key and all the rest of the unit of work would be zero keys, implying an optional relationship which is not the case (its just arrived early - is this really an issue I ask myself, yes if INFO layer extracts as it would need to check if a line actually exists - yet more processing)

If I move the line number to the SAT I am still in the same boat as the supporting SATs cannot participate in the order line LNK.

If I look at another transactional area - purchase orders, the LNK with DC works nicely as all SATs contain the full unit of work at a line level. Even SATs with lower grain work nicely.

Hope this makes sense!

patrickcuba · 21 June 2023 01:26

Yea, I doubt that order-line would be unique enough to be considered a business key — however it would be dependent on a parent key — make that a dependent-child key imo— also you could either have the dep-key in the link or in the sat, depends on how you want to manage it, query it… i prefer the latter because my link table will remain a unique list and the dep-key would be loaded in the link-sat along with its order line details.

Topic		Replies	Views
hub design question Data Vault 2.0	3	374	23 March 2023
Business Entity with multiple Business Keys Data Vault 2.0 business-key	1	59	19 March 2025
Should HUB be loaded only from actual source and how to define- biz. key if 2 source have diff. keys Data Vault 2.0 dbtvault , business-key , link , hub , datamodelling , satellite	5	129	3 July 2024
HUB Creation from Multiple Source system Data Vault 2.0	15	2268	11 June 2022
To subtype or not Data Vault 2.0	8	380	16 November 2022

HUB or LNK/DC when unit of work is different for source files for the same BK

Related topics