Hi all,
I’m working on a project that models political campaign finance data and the source system data we are using is messy to say the least. The business objects we’ve landed on for this specific case are Committee, Donor, Company, and Receipt.
- Committee: represents a political action committee (PAC).
- Donor: represents a person or other entity that donates to a PAC which is captured via a receipt.
- Company: represents the employer of an individual donor, or a business that is donating.
In the source data we have the following columns:
- Recipient committee id
- Donor name
- Donor address
- Donor employer
- Donor occupation
- Donor type (could be Individual, PAC, or Company)
- Donor committee id
- Receipt date
- Receipt amount
The committee hub feels straightforward as we can simply use the committee id columns to create a committee hub. When a committee is the donor, the Donor committee id column is filled in, otherwise it is null.
The business key for the company hub can be found across multiple columns depending on the donor type (Donor employer OR Donor name). When the Donor type is Company, the Donor employer column is null and the company name is in the Donor name column instead. The company names themselves are free text fields, so we master these later in the process.
Additionally, to uniquely identify a donor we must use the Donor name and Donor address columns.
In this case, is it appropriate to create a hard business rule to tease this out in the raw vault based on the Donor type? Or is Donor type somehow a child dependent key?