Requesting Feedback on my Insurance Policy Example

Hi all,
I hope you’re well. I have recently joined this forum because I have been helping the design of a DataVault 2.0 solution for an insurer and am needing to demonstrate some very specific functionality that I hope DataVault 2.0 can assist me with.

Patrick Cuba, thank you for the articles I have read after exploring this forum and the associated Medium articles. The problem I have is that I now have been shown so many ways to do the same thing that I no longer know which one to choose.

In Insurance, policies are created and updated in a policy system through a series of transactions, normally. New Business, Endorsement (make a change), Renewals and Cancellations.

Each policy system we have (5 legacy and 1 current/future state) behaves a bit differently and we get data delivered to our data lake in batches and are usually all the necessary tables of the policy system related to a policy.

Problem 1: Staging Policy Transaction Data
In order to create an effective way of staging the data, we have to synthesise the transactions by taking all the records that we have been supplied using the effective date for each table, which is really painful, so, we’re having issues trying to figure out how stage it.

Problem 2: Current and Point in Time Data
I have been asked to demonstrate how from all the historical data, I can show two different dimensions of the data, both current and then based upon a point in time.

My understanding is that ‘point in time’ is what data existed at a specific point in time, like restoring a backup to a specific date and then querying that data and maybe comparing it to current with the same query.

The confusion is that, on policy transactions, there is also this value called effective_date, which is when the changes a transaction created become active. So, today I can create a cancellation transaction to cancel next Thursday, for example. It’s a business value, not a technical concept so, I’ve been avoiding this whole ‘applied date’ concept because effective date of transactions has nothing to do with late arriving data at all, it’s a parameter just like brand and policy number.

Problem 3: Demonstrating raw data vault to Data Mart
It’s been a challenge to explain that all the complexity of data vault and all the historical data it stores is not going to impact the end users who would use fact and dimension Kimbal type models.

I am not a developer and want to be able to demonstrate my example data I put together as shown below as hubs and satellites linked together and show both a current and point in time view of the data in a data mart format. I am keen to get a basic example put together that shows how the data is consumable so I can sell the methodology.

Unfortunately, it’s a chicken and egg, they won’t invest in the training and resources for data vault until they see the value in it, I am trying to show the value in it and although I’ve done 10 hours of training and have a background in data architecture, my technical ability is somewhat lacking to show this.

Our data lake is in Google Cloud and we are using DBT. We would prefer to use AutomateDV where possible since we want to be compliant with the methodology properly.

Does anyone have any basic example of two modules linked together with hubs, satellites and pit tables that I can modify to reflect my demonstration as shown below using DBT and Google Cloud?

Thank you kindly in advance, any advice and feedback greatly appreciated

I was hoping to share with you a diagram of what I have but I am blocked from this computer to do so, so I share with you the example data I wanted to demonstrate.

Example Data
Below are just two modules that I have shown an end state of where there are 6 key dates for the creation and updates to a policy and a party:

hub_policy_data = [
    {"hk_policy": "P10", "policy_id": 101, "policy_number": "P101", "customer_id": 201, "load_dts": "2023-01-01 10:00:00"},
    {"hk_policy": "P20","policy_id": 102, "policy_number": "P102", "customer_id": 202, "load_dts": "2023-01-01 11:00:00"}
],
sat_policy_terms_data = [
    {"hk_policy": "P10", "policy_id": 101, "term_start": "2023-01-01", "term_end": "2023-12-31", "effective_date": "2023-01-01", "policy_status": "ACTIVE", "load_dts" : "2023-01-01 10:00:00"},
    {"hk_policy": "P20", "policy_id": 102, "term_start": "2023-01-01", "term_end": "2023-12-31", "effective_date": "2023-01-01", "policy_status": "ACTIVE", "load_dts" : "2023-01-01 11:00:00"},
    {"hk_policy": "P10", "policy_id": 101, "term_start": "2023-01-01", "term_end": "023-03-15", "effective_date": "2023-03-15", "policy_status": "CANCELLED", "load_dts" : "2023-03-08 15:00:00"},
    {"hk_policy": "P10", "policy_id": 101, "term_start": "2023-01-01", "term_end": "2023-12-31", "effective_date": "2023-01-01", "policy_status": "ACTIVE", "load_dts" : "2023-03-20 10:00:00"}
],
sat_annu_premium_data = [
    {"hk_policy": "P10", "total_premium": 321.64, "load_dts": "2023-01-01 10:00:00"},
    {"hk_policy": "P20", "total_premium": 228.15, "load_dts": "2023-01-01 11:00:00"}
],
pit_policy_data = [
    {"hk_policy": "P10", "term_load_dts": "2023-01-01 10:00:00", "an_prem_load_dts": "2023-01-01 10:00:00", "load_dts": "2023-01-01 10:00:00"},
    {"hk_policy": "P20", "term_load_dts": "2023-01-01 11:00:00", "an_prem_load_dts": "2023-01-01 11:00:00", "load_dts": "2023-01-01 11:00:00"},
    {"hk_policy": "P10", "term_load_dts": "2023-03-08 15:00:00", "an_prem_load_dts": "2023-01-01 10:00:00", "load_dts": "2023-03-08 15:00:00"},
    {"hk_policy": "P10", "term_load_dts": "2023-03-20 15:00:00", "an_prem_load_dts": "2023-01-01 10:00:00", "load_dts": "2023-03-20 15:00:00"}
],
hub_insured_parties_data = [
    {"hk_insured": "IN10", "dob": "1978-12-22", "load_dts": "2023-01-01 10:00:00"},
    {"hk_insured": "IN20", "dob": "1978-12-22", "load_dts": "2023-01-01 10:00:00"},
    {"hk_insured": "IN30", "dob": "1978-12-22", "load_dts": "2023-01-01 11:00:00"},
    {"hk_insured": "IN40", "dob": "1978-12-22", "load_dts": "2023-04-10 10:00:00"},
],
sat_party_details_data = [
    {"hk_insured": "IN10", "name": "John West", "dob": "1978-12-22", "load_dts": "2023-01-01 10:00:00"},
    {"hk_insured": "IN20", "name": "Jan Leeming", "dob": "1973-11-01", "load_dts": "2023-01-01 10:00:00"},
    {"hk_insured": "IN30", "name": "Ella Reasoning", "dob": "1965-04-05", "load_dts": "2023-01-01 11:00:00"},
    {"hk_insured": "IN30", "name": "Ellana Briggs", "dob": "1965-04-05", "load_dts": "2023-02-10 10:00:00"},
    {"hk_insured": "IN40", "name": "Bill Gates", "dob": "1971-06-17", "load_dts": "2023-04-10 10:00:00"},
],
pit_party_details_data = [
    {"hk_insured": "IN10", "details_load_dts": "2023-01-01 10:00:00", "load_dts": "2023-01-01 10:00:00"},
    {"hk_insured": "IN20", "details_load_dts": "2023-01-01 10:00:00", "load_dts": "2023-01-01 10:00:00"},
    {"hk_insured": "IN30", "details_load_dts": "2023-01-01 10:00:00", "load_dts": "2023-01-01 11:00:00"},
    {"hk_insured": "IN30", "details_load_dts": "2023-02-10 10:00:00", "load_dts": "2023-02-10 10:00:00"},
    {"hk_insured": "IN40", "details_load_dts": "2023-04-10 10:00:00", "load_dts": "2023-04-10 10:00:00"},
],
link_policy_insured_data = [
    {"hk_policy_insured": "P10-IN10", "hk_policy": "P10", "hk_insured": "IN10", "valid_from": "2023-01-01 10:00:00", "valid_from": "2999-01-01 00:00:00", load_dts": "2023-01-01 10:00:00"},
    {"hk_policy_insured": "P10-IN20", "hk_policy": "P10", "hk_insured": "IN20", "valid_from": "2023-01-01 10:00:00", "valid_from": "2999-01-01 00:00:00", "load_dts": "2023-01-01 10:00:00"},
    {"hk_policy_insured": "P20-IN30", "hk_policy": "P20", "hk_insured": "IN30", "valid_from": "2023-01-01 11:00:00", "valid_from": "2999-01-01 00:00:00", "load_dts": "2023-01-01 11:00:00"},
    {"hk_policy_insured": "P10-IN20", "hk_policy": "P10", "hk_insured": "IN20", "valid_from": "2023-01-01 10:00:00", "valid_from": "2023-04-10 10:00:00", "load_dts": "2023-04-10 10:00:00"}, --replaced
    {"hk_policy_insured": "P10-IN40", "hk_policy": "P10", "hk_insured": "IN40", "valid_from": "2023-04-10 10:00:00", "valid_from": "2999-01-01 00:00:00", "load_dts": "2023-04-10 10:00:00"},
]

In light of the fact I wasn’t able to share a diagram from this computer, I have instead created a plantuml script that will produce the diagram if required. Thanks

@startuml Raw Data Vault

entity "HUB_POLICY" {
    * hash_policy_id : string
    --
    policy_id : int
    policy_number : string
    customer_id : int
    load_dts : datetime
}

entity "SAT_POLICY_TERM" {
    * hk_policy : string
    --
    term_start : date
    term_end : date
    effective_date : date
    policy_status : string
    load_dts : datetime
}

entity "SAT_POLICY_ANNU_PREMIUMS" {
    * hk_policy : string
    --
    total_premium : float
    load_dts : datetime
}

entity "PIT_POLICY" {
    * hk_policy : string
    --
    load_dts : datetime
    term_load_dts : datetime
    an_prem_load_dts : datetime
}

entity "HUB_INSURED_PARTIES" {
    * hk_insured : string
    --
    dob : date
    load_dts : datetime
}

entity "SAT_PARTY_DETAILS" {
    * hk_insured : string
    --
    name : string
    dob : date
    load_dts : datetime
}

entity "PIT_INSURED_PARTIES" {
    * hk_insured : string
    --
    load_dts : datetime
    details_load_dts : datetime
}

entity "LINK_POLICY_INSURED_PARTIES" {
    * hk_policy_insured : string
    --
    hk_policy : string
    hk_insured : string
    valid_from : datetime
    valid_to : datetime
    load_dts : datetime
}

HUB_POLICY ||--o{ SAT_POLICY_TERM
HUB_POLICY ||--o{ SAT_POLICY_ANNU_PREMIUMS
HUB_POLICY ||--|| PIT_POLICY
SAT_POLICY_TERM ||--|| PIT_POLICY
SAT_POLICY_ANNU_PREMIUMS ||--|| PIT_POLICY
HUB_POLICY ||--o{ LINK_POLICY_INSURED_PARTIES
HUB_INSURED_PARTIES ||--o{ SAT_PARTY_DETAILS
HUB_INSURED_PARTIES ||--|| PIT_INSURED_PARTIES
SAT_PARTY_DETAILS ||--|| PIT_INSURED_PARTIES
HUB_INSURED_PARTIES ||--o{ LINK_POLICY_INSURED_PARTIES


@enduml

I discuss this multi-temporal problem in my book, yes never use business dates as an applied date because it is unreliable, i.e. you can effective dates in the future and thus ordering your bi-temporal vault by this date will make your data useless at best, devastating at worst.

End users will not use the DV unless they’re DV savvy, DV is for EDW — Kimball lives on top of a DV and because its backed by the DV it makes the Kimbal model disposable

@Shagility has a background with DVs on Big Query, chase him.

1 Like

Thanks for your input, thanks for clarifying the date thing, i view business data as business data, not to be messed with, dates, references, etc. all to be queried like any other database, not specific to datavault2.0.

One thing i am interested is your view on staging historical data when it comes to transactions that are basically requests to create or change something like a policy.

The data changed is in its current state and the transactions are just historical records that show that someone at sometime did a specific type of transaction lile renew a policy, change one or cancel it.

It would appear that certain things need to be synthesized like, prior to a cancellation or renewal transaction, the policy must have been active.

From a time context based upon transactions (interaction records) the snapshot invariably doesnt show you the current state of a policy but simply that at a particular point in time a transaction occured that will change the policy with now, in the future or past.

If i have a transaction that cancels tomorrow, today i need to see that it is active then i know have this new transaction record that intends to change it to what it requested.

You see my confusion of how to approach this? Is there anyone who has insuramce experience that might know more about addressing this? Its so very different to goods and sales transactions like invoices, receipts, purchase orders, bills, etc.

Thanks again (apologies for typos, typing without glasses on)

100%… you’re on the right path padawan;

As for the latter — this sounds like a business question — seriously — you cannot solve this without business input; your business must specify how to handle this because they are the custodians of their corporate memory and therefore must be held accountable to their decisions.

My recommendation is to workshop the issue with your customer, and describe what the right path is. You have bad data now, but it will cease to be bad data in the future… if your’re lucky you will need to deal with ephemeral technical debt — i.e. the older the data gets the less relevant it becomes and you can archive that old data as “before data vault”.