Is there any precedence for a single table (self-referencing?) LINK?

damn_yankee · 4 August 2023 09:35

For a definition of a LINK in DV 2.0, I always see the phrase “A link is a connection to 2 or more hubs”, or some such, but I want to create a single table link. Something akin to a “self referencing foreign key” from the olden days.

Consider this situation… I’ve got a HUB that stores financial transactions… there are various types of these transactions (identified in the SAT)… transfer, charge, payment, adjustment, . Two of these types (payment and adjustment), can sometimes have a relationship with a third type (prepayment).

I need to model this relationship. Please tell me why an [L_TRAN_TRAN] table wouldn’t fit the bill perfectly?

L_TRAN_TRAN

The LINK definition would look like this…

LTT_def

It doesn’t feel like I’m breaking any DV 2.0 rules here… Does the situation not warrant this approach? Has anyone used this approach with success? What are my modelling alternatives?

Thanks!

patrickcuba · 5 August 2023 08:22

A HUB does not store transactions. If you are referencing DV2.0 standards then you’d know a hub has business keys and nothing else.

This line should be modified in the training, it should say “a link is a connection between TWO or more business objects and ONE or more hubs.” This covers same-as and hierarchy link tables which are nothing more than link tables themselves.

As in most cases in a data warehouse-type platform you are most likely dealing with a batch of transactions landed as a file/table. The optimal and idempotent approach to model this is to model it as a sat_link off a link tables and model the transaction_id/transaction_date as the dependent-child key in that link-sat.

People hear the word transaction and immediately think it should be modelled as a transaction-link. Not so! A T-Link has been renamed to non-historised link to deal with this interpretation. A T-Link/NH-Link also does not have a HashDiff and therefore it cannot be guaranteed to be idempotent.

Non-historised links/sats covers near-real-time data loads where the job of guaranteeing exactly once delivery is the responsibility of the source-technology (Kafka) to do so.

Your link-sat will also contain the transaction type column as just a regular attribute modelled from the source. The link itself will record the business objects participating in that transaction, i.e. account + customer (at least).

damn_yankee · 9 August 2023 14:24

Replace “transaction” with “car”, or “shoe”… would you answer be the same? I appreciate the reply, but I think it misses the mark.

cmeyersohn · 9 August 2023 18:32

You can create a link from a self-referencing hub. For example, data coming in from an employee table that includes the employee’s manager or supervisor business key. A manager or supervisor is also an employee. So the employee record arrives with a self-referencing RELATIONSHIP between an employee and their supervisor. You have a link (relationship) between the employee and their supervisor arriving in the source feed. You can insert the relationship into the link, preserving the standard for a link table, and insert the employee business key into the hub, preserving the standard for a hub. Hopefully, this answers your question.
Respectfully,
Cindi Meyersohn
CDVP2 Instructor

patrickcuba · 9 August 2023 20:57

Hubs do not store transactions — who missed the mark?

Replace Hub with dependent-child key and you have

Less tables, less code, cleaner model
Hubs that actually represent business objects and not relationships/transactions between other business objects

sicarul · 10 August 2023 09:35

I might be wrong but i think you should have the entities in your transaction instead, e.g. the customer A and customer B that are part of the transaction, and/or bank A and bank B source/destination, etc. Find the business entities and those will be the hubs. I’d probably model transactions as a link or transactional link instead, and use satellites on top to enrich the data. Treating the transaction as a business object seems wrong to me.

nabav · 14 August 2023 13:37

damn_yankee wants to capture transaction-to-transaction relationships. I’m not clear what would be the recommended approach for doing that? The replies so far don’t seem to give guidance on this, if we’re excluding the possibility of modelling a transaction as a hub.

patrickcuba · 14 August 2023 22:01

Have you considered a same-as link over a link?

If transactions are related and coming from the same source they should be loaded to the same sat-link
if transactions are coming from different sources then there must be a rule that says these transactions are the same, begs the question: why record the same facts twice?
if indeed this is requirement then could they not be loading to the same link table have seperate link-sats? And by extension a source provides the rule that a transaction relates to another then you could easily build a sal on a link that way.

nabav · 15 August 2023 07:39

The question that was asked isn’t about a “same-as” scenario. It’s about a situation in which the following kind of thing can happen: a “prepayment” transaction arrives, and then at some later date a “payment” transaction arrives. These two transactions are entirely separate and distinct; they’re not the same transaction at all. But the “prepayment” transaction has a relationship to the “payment” transaction.

We want a model that allows this type of situation to be represented, and allows us to write a query that returns (let’s say) a list of prepayment transaction / payment transaction pairs meeting the condition that prepayment transaction date falls within a certain date range.

It seems from your answer that you are suggesting this can be achieved with the help of link sats, but I’m not clear exactly how. Can you clarify?

patrickcuba · 15 August 2023 09:41

so again… why is there a hub_transaction in the diagram? And why can’t the transactions live in the same link-sat?
If you are looking for a rule to identify these “related” transactions then you can build a SAL over the link table based on that rule — this becomes a business vault SAL. Although not a same-as link as the book suggests, the structure is the same.

nabav · 15 August 2023 15:28

The reason there’s a Hub Transaction in the diagram is that the person who posted the question wanted to model Transaction as a Hub. You’re saying that this is bad practice and it’s better to model Transaction as a Link Sat.

That’s all fine as far as it goes. But we’re still left with the question of how to model Transaction-to-Transaction relationships. Your answer is that we should introduce a “same-as link over a link”.

My response to this is, 1) I thought we’re supposed to build links over hubs, not over links? I thought building links over links was considered an anti-pattern? And 2) even if we did introduce a SAL over a link, why would that help? The things we’re trying to link to each other are Transactions; and a Transaction ISN’T represented by an entry in the Link table. A Transaction is instead represented by an entry in the LINK SAT table. You said yourself: “The optimal and idempotent approach … is [to model Transaction] as a Sat Link off a link table and model the transaction_id/transaction_date as the dependent-child key in that link-sat.”

Therefore, what’s needed isn’t a way of relating one Link Table row to another row from the same Link Table. And so a SAL over a Link won’t help. What’s needed instead is a way of relating one Link Sat row to another Link Sat row. How do you do that?

AHenning · 15 August 2023 21:04

Describe the incoming dataset with examples and I will explain how you can do it.

patrickcuba · 15 August 2023 22:46

Correct transaction is not a business object — if you start sourcing more transaction data I assume you would then be loading to the same hub_transaction, would the transaction id not clash?
Transactions also do not change, they are facts, and interaction of business objects that is immutable, like event data. You never go back and change a transaction. Therefore do you model it as a t-link? No, a t-link is renamed to a non-historised link, why? Because modellers were by default making txns a t-link. NH-links also don’t have hashdiffs, therefore if you run the load twice how do you ensure you’re not loading duplicates? NH-links are really for NRT streaming, because the act of ensuring exactly once deliver is with the source technology — i wouldn’t even add hash-keys to these tables because hashing is expensive! Therefore where do batched txns go? Link-sats, the dependent-child key is the transaction id.

You’re not suppose to model link on links — yes you are correct, why then, do you need to relate one transaction to another? I assume you know what you’re doing and you need this, you can create a SAL on a link mapping one txn to another. Now this means you’d need the dep-key in the link? If so you might be better of building that logic into a BV SAL because it sounds like you’re needing to manage this rule in DV. Now if you do not add it to the link then you might need an adjacent BV-link-sat mapping the two together as dep-keys. But this is a rule you have to manage, the best solution is if the source supplies this relationship to you, then in which case you’re simply surfacing that using an info-mart.

Incorrect, txn is the interaction between participating business objects – the transaction, the details of the txn goes into the link-sat.

The optimal and idempotent approach … is [to model Transaction] as a Sat Link off a link table and model the transaction_id/transaction_date as the dependent-child key in that link-sat – yes, the to batch txns is idempotency.

nabav · 16 August 2023 14:20

Let’s apply what you’ve said to a real source model, as then it’ll be easier to pin down what you have in mind. Here’s a source model:

Note that ASSOCIATED_PREPAYMENT_TRANSACTION_NUMBER (FK) is nullable, as only some transactions have an associated prepayment transaction.

Now, leaving aside the complication of the associated prepayment transaction, I believe (based on what you’ve said) that the basic Data Vault model you would advocate here is:

I haven’t included all the metadata columns and so forth but hopefully you get this idea.

Is this what you have in mind? If not, just let me know the ways in which the model you have in mind differs from this model. As well as how you would extend it to capture the Transaction-to-Transaction relationship.

One observation I would make, just in passing, is that a plain old Link Table is incapable of storing Transactions. A row in the OUR_LINK table doesn’t represent a Transaction; rather, a row in the OUR_LINK table might have 10,000 Transactions against it - and these 10,000 Transactions are found in the OUR_LINKSAT table. That is, OUR_LINK is at a higher grain than the Transaction grain that we find in OUR_LINKSAT. There’s nothing necessarily wrong with that, but it’s important to be clear about it.

AHenning · 16 August 2023 19:17

Model the transaction table as a Non historized link. Pk is hub_transaction, also model another hub inside the nhl: hub_transaction for associated…transaction_number. This is a very straight forward.

nabav · 16 August 2023 19:55

Yes, that’s right, it’s very straightforward if you do it like that. It’s less straightforward for a person who objects both to Transaction Hubs and to NHL transaction tables (in a batch situation). So I’m interested to see how a person who holds these objections would approach it.

patrickcuba · 21 August 2023 07:39

Perhaps there’s a of lost in translation happening here

HUB_TRANSACTION_TYPE is not a business object and would never be modelled as such
Counterparty number is a business object, yes this should be hub

Now – I am suprised the txn does not have another participating business object because a transaction is the interaction between business objects and event, perhaps against an account + product or something. Somehow this is not occuring here.

Again – no T-Link
Again – a transaction is not a business object

If this is all you’ve got then I refer back to the dependent-child key I explained earlier, therefore the model will be:

HUB_COUNTERPARTY
SAT_TRANSACTIONS ← this will have your dependent child key(s)

That’s it, two tables where you had four

nabav · 21 August 2023 09:46

We can change the example to use Account rather than Transaction Type, so that we end up with something that you would recognize as having more of a ring of plausiblity. So, a source model like this:

Are you then advocating a Data Vault model like the following?

Is this what you have in mind? If not, just let me know the ways in which the model you have in mind differs from this model. As well as how you would extend it to capture the Transaction-to-Transaction relationship.

patrickcuba · 21 August 2023 23:27

Better – but for the sake of not repeating myself, refer to earlier feedback

nabav · 22 August 2023 11:08

The thing about your earlier feedback (regarding the Transaction-to-Transaction relationship) is that it refers a number of times to a SAL-on-a-Link: a SAL with 2 foreign keys to the same link table.

I’ve read through the feedback a number of times and trying to parse it, but I’m still not grasping how you envisage this working, in light of the fact that the actual transactions being related to each other can’t be identified without the use of their dependent child keys. Do you mean something like this?

The semantics of OUR_SAL_ON_OUR_LINK would have to be: for the OUR_LINKHASH_KEY value on this row, at least one transaction posted against the {Counterparty + Account} represented by the OUR_LINKHASH_KEY value is associated with a (prepayment) transaction posted against the {Counterparty + Account} represented by the OUR_LINKASH_KEY_OF_PREPAYMENT value.

Topic		Replies	Views
Hub - Link - Satellite Data Vault 2.0 link , hub , satellite	1	189	25 November 2024
Modeling of link: reference a hub by businessKey instead of hashKey Data Vault 2.0 link	4	54	16 June 2025
Why are hubs, links and satellites separate tables? Data Vault 2.0 business-key , link , hub , satellite	11	134	7 March 2025
Initial Loading of Link Tables from Staging Data Vault 2.0 patricksworks	8	1407	24 April 2024
Self Join Scenario Data Vault 2.0	3	90	20 October 2024

Is there any precedence for a single table (self-referencing?) LINK?

Related topics