Restructuring Source Data

Should you ever split the source data table into multiple hubs/sats (unless PII)?

Would this be considered to be restructuring the source data in the raw data vault?

Let’s say you have a table coming from yet another CRM that represents the contact, however it is highly de-normalised and contains the role, organisation, and location address as well.

There are no business keys other than the surrogate key that maintains the contact record and their attributes.

If I attach the source table to the contact hub with its descriptive data in a satellite, I have created the least amount of tables (and joins), both in terms of hubs, links, and satellites.

I assume that this is correct?

It depends. But yes you split source tables across hubs and links where the source table maps to multiple concepts.

Some source tables are messy. Most source tables tend to map to one main concept plus a few fks (links and referenced hubs).

Attributes go into one or more satellites attached to the mapped hubs and links where attributes describe those concepts. But as a source table usually has attributes at one grain you’ll find that most if not all attributes will go into a satellite attached to the main concept covered by the source (other than metadata you might want to capture such as Status Tracking, and Link effectivity satellites …). You can further break down satellites to separate PII data or frequently changing attributes from infrequent ones depending on your taste and local standards.

1 Like