Is there any precedence for a single table (self-referencing?) LINK?

For a definition of a LINK in DV 2.0, I always see the phrase “A link is a connection to 2 or more hubs”, or some such, but I want to create a single table link. Something akin to a “self referencing foreign key” from the olden days.

Consider this situation… I’ve got a HUB that stores financial transactions… there are various types of these transactions (identified in the SAT)… transfer, charge, payment, adjustment, . Two of these types (payment and adjustment), can sometimes have a relationship with a third type (prepayment).

I need to model this relationship. Please tell me why an [L_TRAN_TRAN] table wouldn’t fit the bill perfectly?

L_TRAN_TRAN

The LINK definition would look like this…

LTT_def

It doesn’t feel like I’m breaking any DV 2.0 rules here… Does the situation not warrant this approach? Has anyone used this approach with success? What are my modelling alternatives?

Thanks!

A HUB does not store transactions. If you are referencing DV2.0 standards then you’d know a hub has business keys and nothing else.

This line should be modified in the training, it should say “a link is a connection between TWO or more business objects and ONE or more hubs.” This covers same-as and hierarchy link tables which are nothing more than link tables themselves.

As in most cases in a data warehouse-type platform you are most likely dealing with a batch of transactions landed as a file/table. The optimal and idempotent approach to model this is to model it as a sat_link off a link tables and model the transaction_id/transaction_date as the dependent-child key in that link-sat.

People hear the word transaction and immediately think it should be modelled as a transaction-link. Not so! A T-Link has been renamed to non-historised link to deal with this interpretation. A T-Link/NH-Link also does not have a HashDiff and therefore it cannot be guaranteed to be idempotent.

Non-historised links/sats covers near-real-time data loads where the job of guaranteeing exactly once delivery is the responsibility of the source-technology (Kafka) to do so.

Your link-sat will also contain the transaction type column as just a regular attribute modelled from the source. The link itself will record the business objects participating in that transaction, i.e. account + customer (at least).

Replace “transaction” with “car”, or “shoe”… would you answer be the same? I appreciate the reply, but I think it misses the mark.

You can create a link from a self-referencing hub. For example, data coming in from an employee table that includes the employee’s manager or supervisor business key. A manager or supervisor is also an employee. So the employee record arrives with a self-referencing RELATIONSHIP between an employee and their supervisor. You have a link (relationship) between the employee and their supervisor arriving in the source feed. You can insert the relationship into the link, preserving the standard for a link table, and insert the employee business key into the hub, preserving the standard for a hub. Hopefully, this answers your question.
Respectfully,
Cindi Meyersohn
CDVP2 Instructor

1 Like

Hubs do not store transactions — who missed the mark?

Replace Hub with dependent-child key and you have

  • Less tables, less code, cleaner model
  • Hubs that actually represent business objects and not relationships/transactions between other business objects

I might be wrong but i think you should have the entities in your transaction instead, e.g. the customer A and customer B that are part of the transaction, and/or bank A and bank B source/destination, etc. Find the business entities and those will be the hubs. I’d probably model transactions as a link or transactional link instead, and use satellites on top to enrich the data. Treating the transaction as a business object seems wrong to me.

damn_yankee wants to capture transaction-to-transaction relationships. I’m not clear what would be the recommended approach for doing that? The replies so far don’t seem to give guidance on this, if we’re excluding the possibility of modelling a transaction as a hub.

Have you considered a same-as link over a link?

  • If transactions are related and coming from the same source they should be loaded to the same sat-link
  • if transactions are coming from different sources then there must be a rule that says these transactions are the same, begs the question: why record the same facts twice?
  • if indeed this is requirement then could they not be loading to the same link table have seperate link-sats? And by extension a source provides the rule that a transaction relates to another then you could easily build a sal on a link that way.

The question that was asked isn’t about a “same-as” scenario. It’s about a situation in which the following kind of thing can happen: a “prepayment” transaction arrives, and then at some later date a “payment” transaction arrives. These two transactions are entirely separate and distinct; they’re not the same transaction at all. But the “prepayment” transaction has a relationship to the “payment” transaction.

We want a model that allows this type of situation to be represented, and allows us to write a query that returns (let’s say) a list of prepayment transaction / payment transaction pairs meeting the condition that prepayment transaction date falls within a certain date range.

It seems from your answer that you are suggesting this can be achieved with the help of link sats, but I’m not clear exactly how. Can you clarify?

1 Like

so again… why is there a hub_transaction in the diagram? And why can’t the transactions live in the same link-sat?
If you are looking for a rule to identify these “related” transactions then you can build a SAL over the link table based on that rule — this becomes a business vault SAL. Although not a same-as link as the book suggests, the structure is the same.

The reason there’s a Hub Transaction in the diagram is that the person who posted the question wanted to model Transaction as a Hub. You’re saying that this is bad practice and it’s better to model Transaction as a Link Sat.

That’s all fine as far as it goes. But we’re still left with the question of how to model Transaction-to-Transaction relationships. Your answer is that we should introduce a “same-as link over a link”.

My response to this is, 1) I thought we’re supposed to build links over hubs, not over links? I thought building links over links was considered an anti-pattern? And 2) even if we did introduce a SAL over a link, why would that help? The things we’re trying to link to each other are Transactions; and a Transaction ISN’T represented by an entry in the Link table. A Transaction is instead represented by an entry in the LINK SAT table. You said yourself: “The optimal and idempotent approach … is [to model Transaction] as a Sat Link off a link table and model the transaction_id/transaction_date as the dependent-child key in that link-sat.”

Therefore, what’s needed isn’t a way of relating one Link Table row to another row from the same Link Table. And so a SAL over a Link won’t help. What’s needed instead is a way of relating one Link Sat row to another Link Sat row. How do you do that?

Describe the incoming dataset with examples and I will explain how you can do it.

Correct transaction is not a business object — if you start sourcing more transaction data I assume you would then be loading to the same hub_transaction, would the transaction id not clash?
Transactions also do not change, they are facts, and interaction of business objects that is immutable, like event data. You never go back and change a transaction. Therefore do you model it as a t-link? No, a t-link is renamed to a non-historised link, why? Because modellers were by default making txns a t-link. NH-links also don’t have hashdiffs, therefore if you run the load twice how do you ensure you’re not loading duplicates? NH-links are really for NRT streaming, because the act of ensuring exactly once deliver is with the source technology — i wouldn’t even add hash-keys to these tables because hashing is expensive! Therefore where do batched txns go? Link-sats, the dependent-child key is the transaction id.

You’re not suppose to model link on links — yes you are correct, why then, do you need to relate one transaction to another? I assume you know what you’re doing and you need this, you can create a SAL on a link mapping one txn to another. Now this means you’d need the dep-key in the link? If so you might be better of building that logic into a BV SAL because it sounds like you’re needing to manage this rule in DV. Now if you do not add it to the link then you might need an adjacent BV-link-sat mapping the two together as dep-keys. But this is a rule you have to manage, the best solution is if the source supplies this relationship to you, then in which case you’re simply surfacing that using an info-mart.

Incorrect, txn is the interaction between participating business objects – the transaction, the details of the txn goes into the link-sat.

The optimal and idempotent approach … is [to model Transaction] as a Sat Link off a link table and model the transaction_id/transaction_date as the dependent-child key in that link-sat – yes, the to batch txns is idempotency.

Let’s apply what you’ve said to a real source model, as then it’ll be easier to pin down what you have in mind. Here’s a source model:

image

Note that ASSOCIATED_PREPAYMENT_TRANSACTION_NUMBER (FK) is nullable, as only some transactions have an associated prepayment transaction.

Now, leaving aside the complication of the associated prepayment transaction, I believe (based on what you’ve said) that the basic Data Vault model you would advocate here is:

image

I haven’t included all the metadata columns and so forth but hopefully you get this idea.

Is this what you have in mind? If not, just let me know the ways in which the model you have in mind differs from this model. As well as how you would extend it to capture the Transaction-to-Transaction relationship.

One observation I would make, just in passing, is that a plain old Link Table is incapable of storing Transactions. A row in the OUR_LINK table doesn’t represent a Transaction; rather, a row in the OUR_LINK table might have 10,000 Transactions against it - and these 10,000 Transactions are found in the OUR_LINKSAT table. That is, OUR_LINK is at a higher grain than the Transaction grain that we find in OUR_LINKSAT. There’s nothing necessarily wrong with that, but it’s important to be clear about it.

Model the transaction table as a Non historized link. Pk is hub_transaction, also model another hub inside the nhl: hub_transaction for associated…transaction_number. This is a very straight forward.

Yes, that’s right, it’s very straightforward if you do it like that. It’s less straightforward for a person who objects both to Transaction Hubs and to NHL transaction tables (in a batch situation). So I’m interested to see how a person who holds these objections would approach it.

2 Likes

Perhaps there’s a of lost in translation happening here

  1. HUB_TRANSACTION_TYPE is not a business object and would never be modelled as such
  2. Counterparty number is a business object, yes this should be hub

Now – I am suprised the txn does not have another participating business object because a transaction is the interaction between business objects and event, perhaps against an account + product or something. Somehow this is not occuring here.

  • Again – no T-Link
  • Again – a transaction is not a business object

If this is all you’ve got then I refer back to the dependent-child key I explained earlier, therefore the model will be:

  • HUB_COUNTERPARTY
  • SAT_TRANSACTIONS ← this will have your dependent child key(s)

That’s it, two tables where you had four

We can change the example to use Account rather than Transaction Type, so that we end up with something that you would recognize as having more of a ring of plausiblity. So, a source model like this:

image

Are you then advocating a Data Vault model like the following?

image

Is this what you have in mind? If not, just let me know the ways in which the model you have in mind differs from this model. As well as how you would extend it to capture the Transaction-to-Transaction relationship.

1 Like

Better – but for the sake of not repeating myself, refer to earlier feedback

The thing about your earlier feedback (regarding the Transaction-to-Transaction relationship) is that it refers a number of times to a SAL-on-a-Link: a SAL with 2 foreign keys to the same link table.

I’ve read through the feedback a number of times and trying to parse it, but I’m still not grasping how you envisage this working, in light of the fact that the actual transactions being related to each other can’t be identified without the use of their dependent child keys. Do you mean something like this?

The semantics of OUR_SAL_ON_OUR_LINK would have to be: for the OUR_LINKHASH_KEY value on this row, at least one transaction posted against the {Counterparty + Account} represented by the OUR_LINKHASH_KEY value is associated with a (prepayment) transaction posted against the {Counterparty + Account} represented by the OUR_LINKASH_KEY_OF_PREPAYMENT value.

1 Like