Hi Y’all. My name is Sunny. I know there has been a significant discussion on the Data Vault alliance discussion group before, but I would like to bring this discussion up here to get some suggestions from the team. We are currently building our data warehouse on snowflake using DV methodology and, of course, using DBTVault, which we love so far. I came across Ledger Summary and Ledger Line item tables in the process. Ledger Summary is an aggregated summary table with $ amount column, LedgerId being the Business Key, order id (relationship to Order HUB), and additional columns at the summary level. The Ledger Line item table will give you details on the summary table for a given ledger id with ledgerlineitem being business key and additional columns. LedgerLineitemd doesn’t have a relationship to Order. The question here is how do we model such tables?
This seems to be revolving around a data modelling quandry. You have described how the source system models the relations between Ledger, LedgerLineItem, and Order, but not described how the business sees them.
The business relations of the entities should be informing your vault model, not the source systems, and with this information being compared to the sources, usually a good vault model will fall out.
I will come back when I have a little more time to do a long answer with assumptions and diagrams.
From what you have described, this would be the source tables:
Without a business rules breakdown, I’m going to make the following assumptions about the business rules:
0. We are dealing with 3 business entities, Ledger, Ledger Line Item, and Order.
1a. An Order has at least one Ledger entry
1b. A Ledger entry only has one associated Order
2a. A Ledger entry has at least one Ledger Line Item
2b. A Ledger Line Item only has one associated Ledger
3. A Ledger Line Item is always considered to be part of the Order associated to it’s parent Ledger
4. A Ledger entry ‘total $$ amount’ should equal the total of the ‘$ amount’ values of the Ledger Line Items
From this set of business relations, we can derive the following vault:
This relation set would be what was assumed by most of the automated ‘make a DV from a source’ tools that are available.
I would caution using any field marked as ‘ID’ as a business key without consulting with the business. IDs very often aren’t the business key. Also I would caution that this has a lot of potential to end up as a ‘Source System Vault’, as you are deriving vault architecture from a source (and limitations/ assumptions that were made in that source). The business entities should always be the starting point of your vault model.
From here, you could materialise a Bridge table to show which Line Items were on which Orders, or placed by which Customers (assuming Orders are placed by Customers). You could also do an error checking view to compare the total of the Line Items to the ‘Total $ Amount’. But those are downstream requirements that need business consultation.
Although I don’t disagree with what Will is saying could this be a deficiency in the source system itself?
How are you meant to derive the relationship if the source system (the business rule automation engine the business paid for) doesn’t provide the unit of work the business desires to derive further analysis from?
So imho, first prize is to log a change request or updated SLA with the source to provide that business process representation for you. It will provide the lowest grain to depict that relationship that you can easily map into your raw vault.
We shouldn’t be in the business of building tech. debt in our models that inevitably become source-system data vaults. The above, if possible will reduce the number of tables needed in the joins and perhaps have a DV built to represent the business architecture’s identified business objects, ultimate goal.
If I understand, ledger-line item cannot exist without the ledger? It sounds like a dependent-child key. DepKeys are shunned in Ensemble Modelling but encouraged in Data Vault 2.0.
Hi @patrickcuba , If the ledger-line item can not exists by itself, and is a dependent-child, would it be just added to the LNK_LEDGER_ORDER instead of having a HUB for the LEDGER_LINE_ITEM?
@patrickcuba and @will.list .I am so sorry for not replying back to this thread. I was OOO and just came back. I do have a meeting with the business to understand the actual business model around Ledger Summary and ledgerlineitem. I will get more details about this soon and update you here. Meanwhile, adding one more question to @saqib 's is when you said dependent child and ledger item cannot exist itself. Do I build it as a Link or a satellite? I read your book on the dependent child and you said “You could get away with simply keeping the dependent child keys in one place, the satellite.” in the Dependent child-keys on the link. I am going to get the business model around this but would like to know when you choose to keep it in LINK or in LINK satellite.
I know you also said below from your book
"The dependent child key on the link emphasizes that the dependent child key and all associated attributes is true and always active for the life of the relationship. Having the dependent child key on the satellite means for that active record of the parent entity we track changes.
How do you make this differentiation? Sorry if I am complicating the discussion.
The test is simple, if I take a line item code to a customer and say"please find me the details of line item 101?". If the response is, “we have many line items, what order are you referring to?” The line key cannot be uniquely associated without the parent key.
It is not a composite key so don’t load it to the same hub
It is a dependent-child key now how do i want to track it?
- As a sub-sequence key of a satellite? ← we track the changes to the line item independently to other line items in the satellite for the same parent key. This differs to an MSAT, MSATs track a change in SET
- As a sub-sequence key of a link? ← for the life of that relationship this is the applicable line-item number, in this case, a relationship to what though? We just eliminated the need for a line-item hub
DepKeys in sats reduce the need to have multiple tables to solve a simple thing, and it allows you to better match the overall DV model to the Business Architecture. Line Item is hardly a Business Object
Hi @patrickcuba . for the sub-sequence key of a link, is this the model?
Looks ok, a reference code in a link should have a Reference table association and not a satellite
So maybe this is where you might put the line-item in the sat instead as a depkey instead
@patrickcuba . can you please explain that a bit more?
Thanks, Patrick. It is super helpful for making decisions moving forward. Thank you Thank you.
Hi @saqib . In my situation, there is no direct link between Ledgeritem to Order. It has to go through the ledger all the time. This is the reason I am going to add the line item in satellite to track the changes in line-item for a Ledger and no need for an additional link.
Edit made after Patrick’s comment: Only after discussion with business users on relationship and how business sees them.
Like I said earlier, don’t treat your DV as the sink of technical debt, if the source does not supply this relationship, demand/ask for it. DV starts to get a bad name because it is here that business and IT see the cost escalate. First prize is to solve this debt up front, what you’re suggesting is a workaround to cater for a bad source model == source-system-vault.
If you must go with this model then call this out, “look at what hurdles we have to jump to get you the analytics you need”.
The DV ultimately shows overall health of enterprise IT purchasing decisions; if the tool used to cannot support your business efficiently, of what worth is the tool?
Thanks Patrick. I get that. We are going to have a discussion with source team. They are going to get us the necessary relationships. I already talked to them today and going to tomorrow. Overall i get the understanding on how/why not to model a source vault which is what i am trying to understand. Will keep you posted on this to Y’all.