Hi all,
I hope you’re well. I have recently joined this forum because I have been helping the design of a DataVault 2.0 solution for an insurer and am needing to demonstrate some very specific functionality that I hope DataVault 2.0 can assist me with.
Patrick Cuba, thank you for the articles I have read after exploring this forum and the associated Medium articles. The problem I have is that I now have been shown so many ways to do the same thing that I no longer know which one to choose.
In Insurance, policies are created and updated in a policy system through a series of transactions, normally. New Business, Endorsement (make a change), Renewals and Cancellations.
Each policy system we have (5 legacy and 1 current/future state) behaves a bit differently and we get data delivered to our data lake in batches and are usually all the necessary tables of the policy system related to a policy.
Problem 1: Staging Policy Transaction Data
In order to create an effective way of staging the data, we have to synthesise the transactions by taking all the records that we have been supplied using the effective date for each table, which is really painful, so, we’re having issues trying to figure out how stage it.
Problem 2: Current and Point in Time Data
I have been asked to demonstrate how from all the historical data, I can show two different dimensions of the data, both current and then based upon a point in time.
My understanding is that ‘point in time’ is what data existed at a specific point in time, like restoring a backup to a specific date and then querying that data and maybe comparing it to current with the same query.
The confusion is that, on policy transactions, there is also this value called effective_date, which is when the changes a transaction created become active. So, today I can create a cancellation transaction to cancel next Thursday, for example. It’s a business value, not a technical concept so, I’ve been avoiding this whole ‘applied date’ concept because effective date of transactions has nothing to do with late arriving data at all, it’s a parameter just like brand and policy number.
Problem 3: Demonstrating raw data vault to Data Mart
It’s been a challenge to explain that all the complexity of data vault and all the historical data it stores is not going to impact the end users who would use fact and dimension Kimbal type models.
I am not a developer and want to be able to demonstrate my example data I put together as shown below as hubs and satellites linked together and show both a current and point in time view of the data in a data mart format. I am keen to get a basic example put together that shows how the data is consumable so I can sell the methodology.
Unfortunately, it’s a chicken and egg, they won’t invest in the training and resources for data vault until they see the value in it, I am trying to show the value in it and although I’ve done 10 hours of training and have a background in data architecture, my technical ability is somewhat lacking to show this.
Our data lake is in Google Cloud and we are using DBT. We would prefer to use AutomateDV where possible since we want to be compliant with the methodology properly.
Does anyone have any basic example of two modules linked together with hubs, satellites and pit tables that I can modify to reflect my demonstration as shown below using DBT and Google Cloud?
Thank you kindly in advance, any advice and feedback greatly appreciated
I was hoping to share with you a diagram of what I have but I am blocked from this computer to do so, so I share with you the example data I wanted to demonstrate.
Example Data
Below are just two modules that I have shown an end state of where there are 6 key dates for the creation and updates to a policy and a party:
hub_policy_data = [
{"hk_policy": "P10", "policy_id": 101, "policy_number": "P101", "customer_id": 201, "load_dts": "2023-01-01 10:00:00"},
{"hk_policy": "P20","policy_id": 102, "policy_number": "P102", "customer_id": 202, "load_dts": "2023-01-01 11:00:00"}
],
sat_policy_terms_data = [
{"hk_policy": "P10", "policy_id": 101, "term_start": "2023-01-01", "term_end": "2023-12-31", "effective_date": "2023-01-01", "policy_status": "ACTIVE", "load_dts" : "2023-01-01 10:00:00"},
{"hk_policy": "P20", "policy_id": 102, "term_start": "2023-01-01", "term_end": "2023-12-31", "effective_date": "2023-01-01", "policy_status": "ACTIVE", "load_dts" : "2023-01-01 11:00:00"},
{"hk_policy": "P10", "policy_id": 101, "term_start": "2023-01-01", "term_end": "023-03-15", "effective_date": "2023-03-15", "policy_status": "CANCELLED", "load_dts" : "2023-03-08 15:00:00"},
{"hk_policy": "P10", "policy_id": 101, "term_start": "2023-01-01", "term_end": "2023-12-31", "effective_date": "2023-01-01", "policy_status": "ACTIVE", "load_dts" : "2023-03-20 10:00:00"}
],
sat_annu_premium_data = [
{"hk_policy": "P10", "total_premium": 321.64, "load_dts": "2023-01-01 10:00:00"},
{"hk_policy": "P20", "total_premium": 228.15, "load_dts": "2023-01-01 11:00:00"}
],
pit_policy_data = [
{"hk_policy": "P10", "term_load_dts": "2023-01-01 10:00:00", "an_prem_load_dts": "2023-01-01 10:00:00", "load_dts": "2023-01-01 10:00:00"},
{"hk_policy": "P20", "term_load_dts": "2023-01-01 11:00:00", "an_prem_load_dts": "2023-01-01 11:00:00", "load_dts": "2023-01-01 11:00:00"},
{"hk_policy": "P10", "term_load_dts": "2023-03-08 15:00:00", "an_prem_load_dts": "2023-01-01 10:00:00", "load_dts": "2023-03-08 15:00:00"},
{"hk_policy": "P10", "term_load_dts": "2023-03-20 15:00:00", "an_prem_load_dts": "2023-01-01 10:00:00", "load_dts": "2023-03-20 15:00:00"}
],
hub_insured_parties_data = [
{"hk_insured": "IN10", "dob": "1978-12-22", "load_dts": "2023-01-01 10:00:00"},
{"hk_insured": "IN20", "dob": "1978-12-22", "load_dts": "2023-01-01 10:00:00"},
{"hk_insured": "IN30", "dob": "1978-12-22", "load_dts": "2023-01-01 11:00:00"},
{"hk_insured": "IN40", "dob": "1978-12-22", "load_dts": "2023-04-10 10:00:00"},
],
sat_party_details_data = [
{"hk_insured": "IN10", "name": "John West", "dob": "1978-12-22", "load_dts": "2023-01-01 10:00:00"},
{"hk_insured": "IN20", "name": "Jan Leeming", "dob": "1973-11-01", "load_dts": "2023-01-01 10:00:00"},
{"hk_insured": "IN30", "name": "Ella Reasoning", "dob": "1965-04-05", "load_dts": "2023-01-01 11:00:00"},
{"hk_insured": "IN30", "name": "Ellana Briggs", "dob": "1965-04-05", "load_dts": "2023-02-10 10:00:00"},
{"hk_insured": "IN40", "name": "Bill Gates", "dob": "1971-06-17", "load_dts": "2023-04-10 10:00:00"},
],
pit_party_details_data = [
{"hk_insured": "IN10", "details_load_dts": "2023-01-01 10:00:00", "load_dts": "2023-01-01 10:00:00"},
{"hk_insured": "IN20", "details_load_dts": "2023-01-01 10:00:00", "load_dts": "2023-01-01 10:00:00"},
{"hk_insured": "IN30", "details_load_dts": "2023-01-01 10:00:00", "load_dts": "2023-01-01 11:00:00"},
{"hk_insured": "IN30", "details_load_dts": "2023-02-10 10:00:00", "load_dts": "2023-02-10 10:00:00"},
{"hk_insured": "IN40", "details_load_dts": "2023-04-10 10:00:00", "load_dts": "2023-04-10 10:00:00"},
],
link_policy_insured_data = [
{"hk_policy_insured": "P10-IN10", "hk_policy": "P10", "hk_insured": "IN10", "valid_from": "2023-01-01 10:00:00", "valid_from": "2999-01-01 00:00:00", load_dts": "2023-01-01 10:00:00"},
{"hk_policy_insured": "P10-IN20", "hk_policy": "P10", "hk_insured": "IN20", "valid_from": "2023-01-01 10:00:00", "valid_from": "2999-01-01 00:00:00", "load_dts": "2023-01-01 10:00:00"},
{"hk_policy_insured": "P20-IN30", "hk_policy": "P20", "hk_insured": "IN30", "valid_from": "2023-01-01 11:00:00", "valid_from": "2999-01-01 00:00:00", "load_dts": "2023-01-01 11:00:00"},
{"hk_policy_insured": "P10-IN20", "hk_policy": "P10", "hk_insured": "IN20", "valid_from": "2023-01-01 10:00:00", "valid_from": "2023-04-10 10:00:00", "load_dts": "2023-04-10 10:00:00"}, --replaced
{"hk_policy_insured": "P10-IN40", "hk_policy": "P10", "hk_insured": "IN40", "valid_from": "2023-04-10 10:00:00", "valid_from": "2999-01-01 00:00:00", "load_dts": "2023-04-10 10:00:00"},
]