Source Tables - joining transactional tables or not?

Sorry - I didn’t mean to insinuate you had led me astray. I guess it comes down to there being lots of documentation around hubs, links and satellites, but not much leading up to that.

Then there are what appear at face value to be conflicting opinions. As you say, the Raw Vault shouldn’t modify the original data besides applying hard rules. However this page suggests that ‘One of the biggest causes of project failure, is designing and mapping a model that mimics the source system too closely.’. Additionally, the example database for dbtvault specifically does join transactional tables in advance of the data flowing into the raw vault.

The example I gave of Salesforce table was probably a little loaded. I would argue that the brand should be a hub. But the record_type table doesn’t just contain brand. It contains brand on records joined to the opportunity table. If the same record_type table is joined to the appointment table, it provides the type of appointment - new lead, follow-up etc. So creating a hub simply off the record_type table seems like a no-go.

Using record_type again, where the record_type could be considered a hub (such as brand), I’d argue that it is the brand itself (i.e. the record_type name) that would be the hub business key, not the SF PK. The brand (as text) would doubtless appear in countless other areas and would be the brand name in those locations. If we didn’t join the record_type table directly to the opportunity table as presented above, we would still need to join it to pull through the brand name (as opposed to the record_type_id) to create the link table between brand and opportunity.

I’m sure this is all very simple. But with the seemingly conflicting approaches to staging, I feel like I’m going around in circles. I’m stuck at the first step in the journey, even though I feel like I now have a relatively good understanding of steps 2, 3, 4 & 5.