Non historized link use case and certification

Hello all,

First of all, thank you a lot for your help and the resources you make available for free to the community, it’s really appreciated.

So I come to you with two major questions. The first one is about a use case that I am currently studying. Indeed, I have to deal with large tables that could be similar to transactional tables. For example, let’s take two of these tables which each notify the sales and returns in our points of sale. During a sale, we get a transaction with multiple information related to our supply chain as well as the data specific to this sale such as the article and the price applied. It is because of this operation that it seems to me to be judicious to use a non-historized link, however, it is the very constitution of this table which poses a big problem to me. This table has no unique sales identifier, so the “composite key” of this source is made of about 20 columns … (There is the item, the currency, location data and even the date). Also, it is absolutely unthinkable to “correct” this source table in the short term, that’s why I was wondering if you have any ideas about how to handle this specific case.

My second question is about certification or formation about specific use case, Are there any service structures in Europe to support us in the development of a specific project or use case regarding the design of our DV2.0 theory models? Maybe we should first get certified by the different companies that offer them?

Thank a lot for your time and have nice day,

Best Regards,

Emmanuel.

Hello Manu!
Use the non historized link. Identify the hubs and identify the primary composite key of your dataset.

1 Like

How many rows? Large for one project is small for another. Have worked with 45TB sized tables.

You are probably stuck with the 20-column identifier. You’d collate and hash them on load, identify hubs to connect the record to and have fk columns for the hub keys too. The fall back would be some made up sequence number or row number in feed, a poor option.

Here at Datavault we offer consulting services and we could help you, including advising you on appropriate training and certification (which is important). We’re happy to talk about options, just reach out to me on neil.strange@data-vault.com.

1 Like

Thank you for your reply @AHenning, @neil.strange

So that’s what I thought, I’ll have to make do with those 20 columns. And so I don’t have to create a HUB for each of these fk, it is possible to identify the businness keys among these 20 columns and create HUBs only for them, right?

Yes indeed excuse me, large doesn’t mean much in the world of data … So no, compared to 45TB of data, I’m a little kid, there is only 300GB for ~750m of recording lines.

Okay fine, I’ll try to send you a message @neil.strange within the week so we can discuss of this.

Yes, identity the business keys and create hubs only for them. The non historized link is indeed a link so it can contain multiple hubs.

Non-historised links and sats are edge-case DV structures reserved for real-time streaming. They replace t-links and were given the new name because people assumed that if they see transactions they instantly deploy a t-link when a transaction file supplied as a batch file could easily be loaded into a link-satellite.

That said, 20 columns to make a file unique seems extreme, however there would be business objects being described int his file otherwise of what use is this file???

With that out of the way, simple idempotent structure is a link-satellite with dependent-child keys, i.e. the business objects are represented by hub tables and the other columns to make this load unique would be dependent-child keys. Easy.

1 Like

Thank you for your reply.

That is what I thought I read, so the non-historized link can ONLY be used in the case of streaming data?

Yes for sure, lot of business objects to describe this file, unfortunately, no one has seen fit to integrate a system-coupled transaction identifier or any other logical compound key such as…

Ok so using a classical form of link-satellite with in the link the business keys which can be materialized by a hub and in the satellite all the other elements so those being part of the keys. So this is the “good practice” in this case?

Well ideally, it’s non-historised and not a “normal” DV pattern. When you have extreme architectural and business constraints then the NH could/should be considered.

Ok very well, thank you!
So my case is not extreme enough and can be solved with a “normal” DV patern from what I understand? It’s always hard for me to identify the best choice for each case I encounter, but I guess training and experience are the answers to that.

I think it isn’t… but up to you of course.

Have you tried sats with dependent child keys as I have suggested?

And you have a lot more experience than a young padawan trying to learn the basics of Data Vault 2.0 haha. But yes of course, I always try to adapt the advice to the case I encounter.

In my opinion there is a misunderstanding, I certainly did not understand the “dependent child” part. Is it a specific concept or are we “just” talking about a satellite connected to a classic link? (Sorry if the questions seem silly, but it’s to make sure I understand)

EDIT : Ok, I didn’t know the concept of “dependent child”, I just did my research and so it seems like a good solution, so the idea would be to make hubs on the business keys for those for which it is possible, to put in dependent child those that can’t and the elements that are not part of the key in a satellite, is that right?

Dependent child key can be modelled into a link table AND/OR a satellite table. I prefer the latter because you can track changes directly to the dep-key with fewer tables.

Dep-keys don’t have hubs — they make sense only within the context of their parent — the hub or link table — they may sub-categorise a parent entity, or (like in your case) be the thing that makes the table unique. Dep-keys can even be called Intra-day keys when what we are modelling is the capture of intra-day changes.

Yes ok, I think I get it, I’ve read the various posts on the “debate” of whether to include Dep-keys in the link or in the satellite, and in my case, it’s really about stock “transactions” that are all unique and will never be updated, so there’s really no point in tracking the history for this particular case.

On the other hand this aspect of “sub-categorise a parent entity” may interest me for another part, thanks !

So it looks like I need to integrate my Dep-keys into the link directly, however, I’m having more and more trouble seeing the big difference with a non-historical link, but I guess I’m still missing some elements to really see the difference.

Thank you very much for your help and any resources you can provide to the community for that matter. Have a nice day! :smiley: