We are pulling a list of stores from our ERP system. In the ERP system, there is a single table for stores. Some of the attributes are:
Address (multiple fields)
Store Size (S, M, L, XL) - these have lookups in the ERP and stand for Small, Medium, Large, Extra Large
Planning type (NP, PE, PN, PU, UN) - these have lookups in the ERP and stand for Not Planned, Plan existing, Plan new, Plan unopened, Unopened.
Now my question, is store size or planning type a business key? if so then I would model these as hubs and link them to the store? These are really just reference data in our ERP, but who knows, someday we may have additional attributes associated with the store size or planning type - likely not, but there is a chance.
I see 3 possible solutions (lets use planning type as an example):
Create a reference table for planning type, leave planning type attribute in the store satellite, consumption model or mart can join to the reference table to get the description
Create a reference table for planning type, leave planning type attribute in the store satellite, add planning type description attribute to store satellite (populate by grabbing desc from reference table)
Create a hub for planning type, satellite for planning type (business key is the planning type), satellite will have the description. Create a link (or add to existing) to associate planning type hub with the store hub.
The way i see it, creating a hub/satellite for planning type adds complexity where its not necessarily needed. but if we ever get additional information about planning type, then having the hub / satellite makes sense. but if we never get additional information, then its unnecessary overhead.
Any help, thoughts, ideas are greatly appreciated.
Neither, think about it — are these uniquely identifiable?
Most of what you described look like reference data to me, perhaps if you do not have business keys then maybe a data vault is not the right modelling paradigm for your data?
We definitely have a business key for the store itself. so there is a store hub, store satellite. when we bring sales transactions, there will be a link that connects the store to the product (plus other hubs as needed).
but the store domain itself, we are trying to decide if something like planning type warrants another hub/satellite or just leave as an attribute of the store satellite.
and what do you mean by uniquely identifiable? our source will have one of these 5 values (and null) - NP, PE, PN, PU, UN. these are unique values, configured in our ERP. these values (or codes) will be assigned to each store when the store in created in the source system. also in the source system is a lookup table where each of NP, PE, PN, PU, UN have a description. in the source system they considered a lookup table. in the source system application, the lookup is done and the description is shown on reports, etc.
Yea, keep this as simple as possible, if you design everything as a data vault hub, link and sat you end up with way too many artefacts to join on, whereas ref content is meantt to enrich your master & transaction data
I’m fairly new to Data Vault myself so i wouldn’t consider my opinion as expert advice but i thought i would provide my opinion and to add my own experience.
I would say that Store Size and Planning Type should be attributes within your Store Satellite.
With regard as to whether the reference description should be within your satellite or not then Patrick’'s book ‘The Data Vault Guru’ provides advice on this on pages 200 - 202. The book advises that if reference descriptions are resolved and provided as part of the SAT table staging load then you do not need to split out the code and the descriptions and these can be loaded into the SAT. However, if your staging file holds the reference code only and you require a lookup to a reference table then the descriptions should not be resolved during the satellite load and should be resolved in the InfoMart layer with a look up to the reference table. This ensures that there is no unnecessary dependency between the Satellite table and the reference table.
In order to provide Point-in-Time capabilities of descriptions for your reference codes (if these can change) you can track the changes in your reference table in a similar way to tracking changes in your SAT. But, for the reference table you do not need to create an Hash Key of the Reference Code and you can use the Reference Code itself as the Primary Key.
Personally, we have included the descriptive reference data in our Satellites. To be honest, this is because we have followed the approach of a 3rd party and resolved the reference lookups during our satellite load before I read pages 200-202 of Patrick’s book providing reasons as to not do so. So, we do have a dependency on the reference tables being loaded prior to our SATs. However, we now have less required joins in our InfoMart for reference data (albeit the joins are now in the DV layer).
With regard to the possibility of additional attributes becoming available for Planning Type and the possible need to refactor your design I have some considerations:
How much future planning can you really do? I suppose take a risk based approach given what the likelihood is of Planning Type becoming a Business entity in itself and requiring a HUB etc versus resource constraints and the ability to refactor, if required.
Consider whether the new Planning Type attributes could be additional attributes of the Store itself. if so, then add to the Satellite, if not, then… see point 3
I’ve only just thought of this so I’m not sure what Patrick might think of this, but, maybe you could use Panning Type as a Dependent Child Key in your Satellite (as a way of future proofing)? With this approach i envisage that Planning Type would still not be a Business Concept but you would be able to track attribute changes within the Planning Type using the Store & Planning Type key combination. This is just a loose idea that occurred to me so i’d be interested in all the reasons why not to do this