In one of our source systems we have several “attribute” tables that essentially extend the primary tables to which they are linked. This system tracks access control across the corporation.
For example, our Application table, which keeps common metadata about applications together, links to an ApplicationAttribute table, which extends or customizes each application in the parent table. The attribute table is essentially a key/value pair table (dictionary, hash table) with a unique surrogate key and the foreign key.
- AppAttributeId - surrogate key
- AppId - foreign key
There are other attribute tables modeled in much the same way for other concepts within the system…RoleAttribute, AccountAttribute, ApplicationAccountXrefAttribute, etc. The structure is essentially constant: a surrogate key, the foreign key, the DataKey and the DataValue.
To avoid influencing the discussion with my ideas, I’d like to hear some initial thoughts on how to model this in the data vault without modeling the source system (although, I’m not aware of any other system that models data this way).
I’m assuming the Business Key is AppId + DataKey? In that case I would create an MSAT with DataKey as Dependent Child Key and AppAttributeId as a plain attribute.
If not, then I think you’d have to use AppAttributeId as DCK.
Not sure what happens when a KVP gets deleted? Is it a hard delete? Or did you leave out the DeletedFlag for simplicity’s sake? In any case, if it is hard delete, you capture it because of the set-based pattern of the MSAT.
Not sure how many different DataKey values you can expect and how much you actually need towards your information mart… I currently use a business rule to pivot some of the frequently used DataKey values into a BDV satellite that’s on the same grain as (in your case) AppId.
@FrenkLoonen - Excellent suggestions! Thank you.
And no…I didn’t leave out a deleted flag. None of the KVP’s are ever deleted, only updated…at least in the case of the application attributes. We do have a (poorly implemented) table in the source system that tracks role questions, and it allows users to create their own KVPs. Those can be deleted. Thankfully, we aren’t pulling that information into the DW yet.
One problem at a time…
Are you loading semi-structured content into a variant column? I do have an upcoming blog covering this.
Alternatively Ensemble Data Vault has a Satellite table pattern called Name-Pair Satellites: a “Key” column and a “Value” column.
Thank you for the response, @patrickcuba.
I suppose the data, in some of the cases, can be considered semi-structured. I would be interested to read your blog post. I assume on Medium, correct?
It will be posted to here: Patrick Cuba - Snowflake
And eventually on Medium