Filtering content of the source - hard rule?

DVBeginner · 24 April 2023 09:11

Hi

I have a question regarding placing of the data filtering logic.
There is an interface that mixes data for different concepts (e.g. entries for offer and contract).
Is it a best practice to filter subsets when loading hub/sate of the offer and contract respectively?
Or rather load everything twice to both sets and filter in business vault?
There might be also 2 cases of filters needed in general to handle this and similar cases:

simple filter - there is an information regarding type of object that determines type of the object (offer, contract)
pre-join - data source need to be joined to another source in order to determine the type of object.

Is there a best practice to tackle such complexities of the data sources?

Regards

Nicruzer · 26 April 2023 11:56

I’m not completely clear on the core issue here. Do you have sample source data that you can share?

It sounds like your source is just a relational table (i.e., junction table, associative entity, cross-reference [xref], link, many-to-many, etc.); however, I may just be misunderstanding that.

Given what you’ve shared, it sounds like you’ve considered the major business concepts to model for this particular use case, correct?

Nat · 27 April 2023 09:54

I would start with the overall business landscape. If you treat quotes and orders/ contracts as different things (and most companies do), or if you are expecting multiple systems, and some systems have the concept of quote separate from order / contract and one has it integrated, I’d split in staging, so that we are able to manage hubs for quotes and orders separately.

Xero does this. It is quite weird in my experience.

patrickcuba · 27 April 2023 23:00

By exception only,

Ideally the source provides the pre-joined and filtered content — should be in your interface contract / SLO
If you do this in your staging then it is a point of maintenance for you, better than post filtering in the data vault

Nat · 28 April 2023 13:43

Problem is when using public APIs you don’t usually get much of a choice on what you get. And the public API pattern is becoming way more common in my experience.

DVBeginner · 9 May 2023 13:51

The source of data is a custom build system, that provides data mostly in mixed format (generic, technical attributes, and business content in XML format).
As part of the XML there is indication whether given entry decribes an Offer or Contract concept.
Currently there is no way that source provides proper interfaces (1 for offer, another for contract). Thus we are loading source 1:1 and in the stage applying filters or joins to another source in order to figure out the context.
Is filtering or prejoin justifiable in this case?

AHenning · 10 May 2023 04:09

It is never justifiable to perform soft business rules before loading the data into the raw vault. Filtering and prejoins are examples of soft business rules. Dont lose data!

Nat · 11 May 2023 15:25

Even if your business concepts don’t match what you have in the source? Not sure i buy that.

AHenning · 13 May 2023 11:43

Hello Nat!
I have never seen business concepts align 100% with the sources.

DVBeginner · 15 May 2023 07:55

Hi Henning,
What is the alternative in your opinion?
Creating technical objects in Raw Vault (e.g. Case, CaseObject, CaseCalculation in our case) to capture the data and then the proper objects in the business vault (Offer, Contract)?

Regards

AHenning · 16 May 2023 20:08

Basically yes. I accept that some parts of the data vault is source centric.
I would never ask the source system to filter or do any fuzzy logic because it always ends the same way with You losing data!
Good luck.

Topic		Replies	Views
Filtering a source table/file when loading Raw Data Vault Data Vault 2.0	7	1373	15 January 2024
Approach for handling the need for "pre-join" Data Vault 2.0	6	373	13 March 2023
Can I restructure source data before loading it into the raw vault? Data Vault 2.0	18	1725	3 April 2022
Source Tables - joining transactional tables or not? Data Vault 2.0	7	1184	19 March 2023
What is a Source Data Vault? Data Vault 2.0	7	1078	16 May 2022

Filtering content of the source - hard rule?

Related topics