I have a file that has the customer’s address (street1, street2, city, state, zip) and another file that has the zip and the metro ID (along with many metro related flags) that the zip falls into. In your expert opinion, should I:
Load the zip into its own HUB_ZIP table and the metro into a HUB_METRO and link them together with that second file
Simply load a REF table with ZIP_CODE, METRO_ID, METRO_NAME or something to that effect?
There are lots of reports written around METRO metrics so it needs to be a “clean” solution, whatever I do here. Thanks in advance!
is Zip a business key? Seems like a look up code to me and should therefore be loaded to a REF table to enrich DV content on the way out in the Information Mart.
Looking at the question as a whole, you’d probably have a hub_customer with address content loaded in a satellite off that hub, at consumption (query time) include the enrichment as part of the information mart.
Let me give some more context:
DEFAULT_ADDRESS_ID, and several attributes related to the customer but nothing else to do with the address.
ZIP_CODE, along with several metadata date fields from the source such as created, updated, and deleted.
METRO_ID, and several attributes that are specific to the combination of zip and its parent metro area.
Since the address isn’t actually provided in that first file and the second file has the
ADDRESS_ID that the
DEFAULT_ADDRESS_ID field ties to, I made a
HUB_ADDRESS table and then setup a link from customer to address that is loaded from both.
The third file has nothing to do specifically with the customer but many reports need to know which metro a customer falls under. There are also store locations which link to an
ADDRESS_ID and, based on the address zip code, ties to the respective metropolis (metro). Each metro has all sorts of flags/switches as mentioned earlier.
I agree with you that zip doesn’t seem like a business key and shouldn’t be loaded into a hub but I don’t know what the best approach would be for linking all that data together as described above. Should it be modeled more “basically” in the vault and all that linking taken care of in the presentation layer (data marts)?
Zip sounds like enrichment, no need to explicitly link the data in the DV — just enrich the data on the way out of the data vault in an information mart.
Imagine if you were to have dozens of address tables with zip in it, you’d end up with way too many DV artefacts to join to.