Hi there JLeicht, welcome and thanks for posting this!
This is a really good set of questions. Your example makes sense, so I’ll walk through it using the same structure with my understanding.
What applied timestamp actually represents
Applied timestamp is not a business-valid timestamp and it is not the same as load timestamp.
It represents the logical “as-of” time that a batch of data is associated with, typically the start of the ingestion or extraction process. It is assigned to the batch as metadata and does not describe when individual rows are physically written to the Data Vault.
That has two important consequences:
-
It is set once per batch, not per row
-
It reflects when the warehouse contextually learned about the data, not when rows were written or when the data was true in the source
So let’s go through all your questions!
Question 1: What determines the applied timestamp?
In practice, it is almost always the start time of the extraction or ingestion job.
So your first interpretation is correct:
-
Extraction starts at 2025-12-20 10:00
-
Data lands shortly after
So as you said the applied timestamp is 10:00 and the load timestamp is the physical insert time (10:05 in your example).
It is not derived from WHERE clauses or source valid_from fields.
Using a source column would break the “batch represents a single moment” rule.
Question 2: What happens on an initial load?
An initial load is simply a large batch.
If you ingest 1,000,000 rows in an initial load, all rows receive the same applied timestamp, because they all belong to the same batch. This is independent of the load timestamp, which simply records when each row was physically written.
This does not imply that the data was all true at that point in business time. It only indicates when the warehouse first became aware of the data.
Question 3: What happens with CDC and corrections?
In your example, the concern is not really about CDC mechanics. It’s about this observation:
Two rows were wrong yesterday, corrected today, but the applied timestamp moves forward. That makes it look like the values only changed today but aren’t a correction.
That observation is correct, and it is intentional.
The applied timestamp advances because the correction arrives as part of a new batch. The vault is recording when the correction became known, not rewriting the past.
The key point is that the applied timestamp is not the mechanism that preserves attribute history.
That responsibility sits with the satellite:
-
The satellite’s primary key anchors the row to the business key
-
The hashdiff detects that the attribute values have changed
-
A new SAT row is inserted with a later load timestamp
When querying the satellite using standard “latest record” logic, the corrected value becomes the current state, even though it arrived later. This is alluded to by @emanueol in his QUALIFY statement.
This allows the vault to support corrected historical data without pretending the warehouse knew the correct value earlier than it did. That would be very naughty.
If you need to express that the corrected value was already valid in the past, that belongs in the business vault using business-effective dates (usually some equivalent to a modified/updated at timestamp) or restatement logic. It is deliberately not handled by applied timestamp.
A useful way to think about it in this scenario:
-
Applied timestamp tells you when a batch of data was applied to the vault
-
Load timestamp tells you when a specific row was physically written
-
Satellites (via PK + hashdiff) tell you when attribute values changed
Together, this allows the vault to show both:
-
what was known at a given point in time
-
and when corrected information became available
If you also need to model when a value was valid in business time, that is handled separately using business-effective logic, not applied timestamp.
Ultimately, you can rename these columns for clarity, just as long as you are consistent!
This thread also goes through some of this, may provide some extra context, and includes some wise words from the author of The Data Vault Guru himself:
Effective From vs Applied Date
Hope this all helps!
Joe