Looking for some suggestions please how to deal with this situation we have come across in satellite(of hub).
While loading source records for zero keys(i.e. null business key) in Satellite, we can see multiple records can be there for same zero key. It may also happen that in same run such multiple records are inserted with same hash key(w.r.t zero key) and Load ts which will violate satellite’s primary key.
We are also trying to check from business perspective if such records without correct business key are at all useful!
What if we simply leave these in lower data layer and don’t load into Satellite? Are we then violating the Data vault 2.0 principle to ingest all as-is source data in Raw data vault?
(certainly These should be a feed back to source system flagging such records.)
Meanwhile we are trying to figure out the usage of such records, I wanted to understand from this forum if someone has come across the same situation and somehow managed to handle it in satellite.
Hi - If I understood your scenario, U have a source OLTP system with a Policy table having Null or empty Policy_Id which is being identified as the BK - correct?
How does your source OLTP deal with these zero Policy_Id records which I assume relate to actual Policies and have non-null business attributes recorded against them?
Do these zero Policy_id recs eventually get filled with actual late-arriving Policy_Id’s, pls?
The source is a data platform (not really an OLTP). They are sending all data to us irrespective of business key is present or not. These are obviously error records.The expectation is eventually the correct policy id will flow in which is fine . But until then there might be records sitting in sat with business key column as -1 and if such multiple records are loaded in same run then sat PK will be violated.
So U will load all these zero BK source recs (where the Policy_id has not yet arrived) into a Hub with UNKNOWN BK = -1 but with Sat’s having valid business attributes.
When the Policy_Id rec does arrive, U will load into its own Hub & Sat & continue normally thereafter - am I right?
OK. So for Policy7 which is late-arriving, U might have loaded, say 3 SAT_POLICY records, against BK = -1 (Unknown). Then the full Policy7 record arrived, so U loaded the HUB_POLICY with the actual non-zero Policy7 BK and corresponding SAT. The Sat’s for Policy7 are then distributed over two Hub keys in this situation.
In my opinion, if U needed to, U could resolve this in the Business Vault by having BHUB_POLICY with Policy7 BK and load its BSAT_POLICY with derived/computed values for that policy.
Thanks, did you mean in BDV BSAT_POLICY we could repopulate/replace the ‘-1’s with actual late arriving policy7 business key by applying some matching logic across the other descriptive attribute values?
Yes - that’s the solution approach. Once U have determined a match between a -1 (Unknown) SAT to the actual Policy Hub & SAT record, I prefer to have a Same-as-Link in BDV:
LINK_SA_ POLICY with HUB HK’s coming from
UNKNOWN (-1) HUB_POLICY
and the actual HUB_POLICY.
This way U can put your matching confidence level (e.g. 95%) in that Link’s SAT.
Also, U can leave the Raw Vault SAT’s as they were loaded and not have to update them in any way. Your choice.
On second thoughts, since your -1 (Unknown) HUB_POLICY may have SAT’s loaded from multiple late-arriving policies, your initial solution might be better.