Why are hubs, links and satellites separate tables?

Please stay on topic

Lol I did, I realise you’ve missed what I had already shown you. Go back to the blog, search for the keyword “JoinFilter” – this is how Snowflake displays that it is using Bloom Filters underneath.
And then you can watch the video and watch how I did it.

Amazing, you’re learning more than one thing today.

Can you provide me with the sql code that extracts the latest bk/hk from a satellite without using any type of preprocessed storing technique? If you have any other technique I am interested.

Ask yourself why we do pre-processing (building CPITs, PITs and the excellent SNOPIT) vs why you would force users to select max per parent key for every query…. cost. Solve that complexity once and you will cut costs. For an experienced architect the benefits should be obvious. The examples I provided are excellent and even include SQL code for your benefit.

Why I used the aggregate as the comparision is because that is something we always have to do in data vault. I dont know if Hook has something simular like PIT tables but at least I know that Hook stores all businesskeys in the satellite, or frame, Andrew please correct me if I am wrong here. So It made sense to compare these constructs.

Regarding cost. There are many factors that drives the costs of a data platform. I dont have a straight answer because it is very dependent on the implementation. There might be good to use PITs to pre-aggregate data for equi joins. But PITs will also add complexity with more tables, loading processes and dependencies.
A good way to cut costs is to use a payment model where you dont pay per query.

Hey Andreas,

Yes you are right. I haven’t defined any specific modelling approaches once the data has been ingested and organised around those formalised business keys (hooks). After that you are free to model the data as you wish to meet specific end-user requirements. If you like dimensional modelling, fine; one-big-table, sure; if producing a PIT table helps, then go for it. Just be sure to pass through any hook values so the modelled assets are integrated.

You can think of Hook as being a version of Data Vault where we’ve removed the hub and link tables and collapsed the business keys into the satellites. The end result is the same but the table structures and processing patterns are much simpler.

1 Like

You could start your own business and compete with those that do.

The best way to save costs btw, is by processing complexities once, and not repeating over and over again by each query. The job of a data engineer is solving this upfront, using PITs and Bridges SIMPLIFIEs the model to a user — the person who would otherwise be running these complexities themselves.

Simple math really.

Simple math works like this:
All queries multiplied by 0 equals 0.
Ask yourself how a business model, that charges by the amount of virtual compute, really wants to optimise query performance within their engine.

There are already a lot of good companies that offers data platforms that dont charge you per query.

I love that you figured that out

Happy to help you anytime! :grinning_face:

To this day it hasn’t happened :wink:

(lol… if your delusions keep you happy mate, I’m happy to help you too).