The Data Vault Staging Balancing Act: Purity vs. Practicality

liberrenaud · 17 July 2025 12:58

Hello AutomateDV community,

I’m new to Data Vault 2.0 and have a foundational question about the proper use of the staging area. I understand the principle that the Raw Vault should be an unaltered, auditable copy of the source data. However, it seems some level of transformation is necessary in the staging area to create the stable business keys required for Hubs.

I’m trying to define the precise boundary between transformations that are acceptable in staging (before the Raw Vault) and those that belong in the Business Vault.
Here are my specific questions:

1. Defining “Hard Rules” vs. “Soft Rules”
Where is the definitive line drawn between technical adjustments (hard rules) and business logic (soft rules)?

Basic Cleansing: Are functions like trimming whitespace (TRIM), casting data types (CAST), and standardizing case (UPPER/LOWER) generally considered acceptable hard rules for the staging area?
Composite Keys: Is concatenating several source fields to create a composite business key (e.g., source_code || '-' || invoice_number) an acceptable hard rule? Or does this cross into business logic?

2. Staging Strategy for Multi-Table Sources
What is the best practice for staging when the components of a single business key are spread across multiple tables within the same source system?

For instance, if Orders has the primary transaction data but the Customer master table contains a required part of the business key, should I create a staging view that joins these tables?
If so, how do I handle this without violating the “unaltered source” principle? Is the goal to present a unified, logical record for the Raw Vault load while maintaining clear lineage back to the original tables?

3. Integrating Multiple Source Systems
When integrating multiple source systems (e.g., two different ERPs), should the staging area always maintain a one-to-one relationship with the source tables?

Is it ever advisable to create one large, combined staging view per source (joining all its relevant tables) before loading into the Raw Vault? Or does this introduce too much complexity and obscure the raw data?

I would greatly appreciate any perspectives or practical examples of how you draw these lines in your own Data Vault implementations. My goal is to build a robust and auditable Raw Vault without over-complicating the staging process.
Thank you!

patrickcuba · 19 July 2025 02:01

There’s a lot to unpack here, may I suggest you watch this video?

liberrenaud · 21 July 2025 13:04

Thanks @patrickcuba! I will look into it.

AHenning · 4 August 2025 16:41

Hello liberrenaud!
A lot of really good questions for being a newbie on data vault.
Regarding soft and hard rules. I would say TRIM is fine as a hard rule if the White spaces are not in the original data but rather appended as a side effect when data is delivered. I would not perform upper or concatenation to replace the original content. But you can perform concatenation and uppers and store it in another field in the satellite. Creating a soft rule on the fly but still being able to recreate the content from the source.

For question #2 and #3, i say no. Some dv folks thinks it is wise to prejoin in the stage area. I do not agree with this because it creates non autonomous loading processes and more complexity.

liberrenaud · 5 August 2025 08:20

Thanks @AHenning for your answer, much appreciated !

Topic		Replies	Views
Staging treated/untreated business keys Data Vault 2.0 business-key	9	375	18 April 2022
How do i build the Business Vault with dbtvault? AutomateDV automatedv	2	782	7 December 2021
What is a Source Data Vault? Data Vault 2.0	7	1114	16 May 2022
Can I restructure source data before loading it into the raw vault? Data Vault 2.0	18	1795	3 April 2022
Source Tables - joining transactional tables or not? Data Vault 2.0	7	1229	19 March 2023

The Data Vault Staging Balancing Act: Purity vs. Practicality

Related topics