Hey everyone,
I have been trying for some time now to model our source data into a data vault. Overall, I would say I am reasonably accustomed to the data vault artefacts and have read up on data vault by using the “scalable data warehouse” and this user group blog.
For the project, I also had prerequisites for certain entities we want to model with data vault to satisfy reporting and analytics needs.
Anyway, I have severly struggled with turning the following source data into a data vault:
The business concepts we try to model from that are: Property, Building, Rentalunit, Lease (RentalContract) and Tenant.
For property we have a uuid which actually comes from salesforce and is probably the business key. In principle I would prefer the Property.name because that is how the business talks about their properties, but since different business units have different names for a property that is probably not the best idea.
Everything below this layer, however, is giving me a lot of headache.
As you my seem from the ER diagram, all child tables have a foreign key parent_id for their one-to-many relationship and an id as the primary key.
However, those id’s are not anything the buisness users know about. The only person knowing about them is essentially the source system maintainer.
For building and building_address we actually also have uuids but those are not coming from salesforce but are system generated. So neither the id, nor the uuid can really be used as a business key.
The most reasonable business key for building would probably be the address, because that is something business may actually talk about and have reporting requirements on, but the source does not enforce the address fields, i.e. they may be empty.
In general, splitting building and building_address (essentially the house_entrance if multiple are present) is not how the business needs the data reported.
For rentalunit we have basically the same issue, the unique key there is either the id or the internal_id which is a composite key - both of them are also not used by the business. I have tried to build a composite business key for this one which consits of the other business keys in the hierarchy, i.e. property.bk and building.bk, and some name for the rentalunit.
The tenant data we have is also a crude mixture of things: It mixes individual tenants, company tenants in addition to some employee roles.
For the commercial tenants the business key would most likely be some kind of VAT ID, but this attribute is also not enforced.
For residential tenants one kind consider using their first_name, last_name combination but there are bound to be duplicates, another bk may be their email - but as you may guess be now, neither of those fields is mandatory.
I have essentially several issues:
- The business keys I identified are sometimes not unique in the source system, i.e. there are duplicates (which should not be there) but the issue is cirumvented from the source side by simply slapping a new ID on any such tuple.
- In a lot of cases I actually need a composite key so I have to join several elements of the source system which I currently do in “pre-staging” but this is of course something I have to maintain myself
- In several cases, joining multiple tables is also necessary to filter out entries which are no longer active, since the source also does not provide that service (an issue which arises because of the duplicates which have a new ID as described above).
- How do you handle something like the rentalunit business key, which is a composite key consisting of property.bk:building.bk:rentalunit.name. Is that a valid thing to do in data vault?
- What about rental contracts? I would argue that a contract is actually a link between two parties and would probably treat the contractual item (the rentaluint in the case of rental contract) as a dependent child key. On the other hand, there are attributes which actually live on the contract level so it may also be a hub?
Do you have any suggestions for me? Does my data vault model even make sense or is there a better way to do that?
Thanks for your help!