Why are hubs, links and satellites separate tables?

AHenning · 3 August 2025 08:09

Please stay on topic

patrickcuba · 3 August 2025 22:34

Lol I did, I realise you’ve missed what I had already shown you. Go back to the blog, search for the keyword “JoinFilter” – this is how Snowflake displays that it is using Bloom Filters underneath.
And then you can watch the video and watch how I did it.

Amazing, you’re learning more than one thing today.

AHenning · 4 August 2025 05:11

Can you provide me with the sql code that extracts the latest bk/hk from a satellite without using any type of preprocessed storing technique? If you have any other technique I am interested.

patrickcuba · 4 August 2025 08:18

Ask yourself why we do pre-processing (building CPITs, PITs and the excellent SNOPIT) vs why you would force users to select max per parent key for every query…. cost. Solve that complexity once and you will cut costs. For an experienced architect the benefits should be obvious. The examples I provided are excellent and even include SQL code for your benefit.

AHenning · 4 August 2025 15:16

Why I used the aggregate as the comparision is because that is something we always have to do in data vault. I dont know if Hook has something simular like PIT tables but at least I know that Hook stores all businesskeys in the satellite, or frame, Andrew please correct me if I am wrong here. So It made sense to compare these constructs.

Regarding cost. There are many factors that drives the costs of a data platform. I dont have a straight answer because it is very dependent on the implementation. There might be good to use PITs to pre-aggregate data for equi joins. But PITs will also add complexity with more tables, loading processes and dependencies.
A good way to cut costs is to use a payment model where you dont pay per query.

Hook · 5 August 2025 01:35

Hey Andreas,

Yes you are right. I haven’t defined any specific modelling approaches once the data has been ingested and organised around those formalised business keys (hooks). After that you are free to model the data as you wish to meet specific end-user requirements. If you like dimensional modelling, fine; one-big-table, sure; if producing a PIT table helps, then go for it. Just be sure to pass through any hook values so the modelled assets are integrated.

You can think of Hook as being a version of Data Vault where we’ve removed the hub and link tables and collapsed the business keys into the satellites. The end result is the same but the table structures and processing patterns are much simpler.

patrickcuba · 6 August 2025 22:06

You could start your own business and compete with those that do.

The best way to save costs btw, is by processing complexities once, and not repeating over and over again by each query. The job of a data engineer is solving this upfront, using PITs and Bridges SIMPLIFIEs the model to a user — the person who would otherwise be running these complexities themselves.

Simple math really.

AHenning · 7 August 2025 04:50

Simple math works like this:
All queries multiplied by 0 equals 0.
Ask yourself how a business model, that charges by the amount of virtual compute, really wants to optimise query performance within their engine.

There are already a lot of good companies that offers data platforms that dont charge you per query.

patrickcuba · 8 August 2025 00:05

I love that you figured that out

AHenning · 8 August 2025 11:37

Happy to help you anytime!

patrickcuba · 10 August 2025 22:54

To this day it hasn’t happened

(lol… if your delusions keep you happy mate, I’m happy to help you too).

Topic		Replies	Views
Can I restructure source data before loading it into the raw vault? Data Vault 2.0	18	1750	3 April 2022
For JSON Source if there is a LINK why need HUB and SAT ? Data Vault 2.0 business-key , link , hub	3	365	6 February 2023
How to model website tracking data in DV 2.0 Data Vault 2.0 link , datamodelling , non-historized-link	18	653	2 March 2024
Technical Advice on SAL and PIT Related to Same Hub Data Vault 2.0	23	791	19 June 2023
Source Tables - joining transactional tables or not? Data Vault 2.0	7	1211	19 March 2023

Why are hubs, links and satellites separate tables?

Related topics