Effective From vs Applied Date

Hi all

Recently at our meetup (December 8th) we announced some new changes to dbtvault. One of these changes was changing “Effective From” in Satellites and Effectivity Satellites to “Applied Date”.

“Applied Date” has been suggested by Dan Linstedt, see below:

Microsoft Word - DVModelingSpecs2.0.1 (danlinstedt.com), Section 2.9

Now, this doesn’t matter so much in terms of your final table structures, as YOU provide the name for the column, however the parameter is still called src_eff, and we want to change it to src_apl or something similar. This will also require an update to our documentation.

We are making this change primarily because historically, Effective From has been confusing for a number of our users and clients, as it conflicts with existing column names or their understanding of business concepts and processes.

I’d like this topic to be about exploring options, and discussing a way forward. “Applied Date” is not yet set in stone.

In one sense it doesn’t matter what you name your columns - as long as you are consistent and clear in the definition of the column. That said there are some conventions - some set by Dan, others used by vendors.

All DV2.0 tables should have a LOAD_DATETIME column and this has a strict definition - the Datetime the record was loaded to the Data Vault. You might name this column loadDateTime, or LOAD_DATETIME, or any other naming convention you use on your project. I like Dan’s use of APPLIED_DATE for the business applied date, but I realise this may have problems in translation. I’ve found EFFECTIVE_FROM confuses some teams because their business has a specific meaning for that term.

I suppose we can extend the range of possible dates we could load - e.g. EXTRACTION_DATETIME (from the source), BATCH_DATETIME (of the batch run), etc. Again be consistent across the model if used.

1 Like

Applied date can be thought of as a package of data and its applicability. It’s nothing more than a snapshot of the data at a point in time — similar to snapshot dates in PITs. I did a presentation on the topic a few months ago.

When it becomes relevant to EFS is when the source app does not supply a business date tracking the relationship changes according to a driving key (s). Either a gap in the source or the business wants to track the driving to non-driving relationship different to how the source does it.

The best result is if the source does supply the effectivity because if you have to do it in DV then you must apply the driving key + EFS as early as possible or you could miss when a relationship returns to the original state — there is no way to track that in the link itself because it is the unique list of relationships.

EFS can only exist off a link table, because a superseding relationship change the link-hash-key, attempting the same thing if it were off a hub table will contain duplicate loads. And besides we have STS and RTS that can be used to better track INSERT-ONLY data tables.





1 Like

Dan’s latest doco is at 2.0.2 :slight_smile:

found it, publicly available

For anyone not sure of the TLAs :slightly_smiling_face::

EFS = Effectivity Satellite
RTS = Record Tracking Satellite
STS = Status Tracking Satellite

Alex, thank you for your update on dbtVault. dbtVault is being well spoken of from the customers that we’ve encountered that are using it. So congratulations!
Neil, I agree with you that Applied Date is not to be confused with Load Date, which does have a very specific definition and use within the Data Vault 2.0 implementation. Thanks for keeping everyone on the right path.
With regard to Applied Date, this date becomes critical in at least two specific scenarios that don’t necessarily have anything to do with Effectivity Satellites.
The first is the situation where the business has “discovered” a mass of historical data that is static. In other words, the data comes from an archived or legacy system that is no longer active or being updated. In this case, the data may cover any number of days, months, years; it may be stored at an aggregated level which impacts its alignment with the current data at an atomic level.
All this to say, when this static data is loaded, the Load Date column retains its definition - it is the date and time when the record first appeared in the warehouse - period. Technically, all of the static historical data could be loaded to the warehouse in a single day - even through a single process - in which case, the Load Date (if batch load driven) will very likely be identical across the entire data set - as it should be. Which begs the question, how does the warehouse team handle the time interval represented by the data set?
If you’re fortunate enough to have a date or timestamp in the operational data, then this question does not necessarily pose a particularly difficult challenge because the data set itself holds the answer on the temporal nature of each record. If the data set does not hold a date, then the concept of an Applied Date becomes critical. Adding an Applied Date attribute enables the data consumers to group data and place data into an appropriate moment in the timeline. The Applied Date can be populated through business logic or some other derivation process if it is not provided by the data set. If it is derived, we strongly recommend having a Service Level Agreement in place with the business that defines how the Applied Date will be derived, and that information should be published for all consumers to reference.
The second situation has to do with real time data being passed across some type of message queuing pipeline and into the raw data vault. Data that is moving through a message queue, by nature, will arrive in the raw data vault out of order because of the manner in which messages are processed off the queue and the dynamic nature of message threads. Similar to the historical data, the Load Date for real time messaging data is the date and time that the data was first inserted into the target data vault object. For this reason, the load date cannot be used to order records for reporting outcomes because using the load date in this fashion will improperly align the data across time.
DV2 recommended best practice is to ensure that your processing logic grabs the message transaction timestamp and the message transaction id as part of the payload. The transaction timestamp should represent the timestamp that the message was pushed onto the network - which means the message itself is timestamped prior to arriving at the message queue. This transactional timestamp is inserted into the Applied Date column and can be used to properly align the sequence of the transactions in business outcomes and queries downstream.
Hopefully, this helps to provide more context on how dbtVault might consider and handle an Applied Date going forward from a DV2 standards perspective.

4 Likes

Hi Alex, Both Effective From (src_eff) in the dbtvault and “Applied Date” suggested by Dan are same, Is my understanding correct? thanks!

Yes they’re the same.

1 Like

thank you @neil.strange