I have a question on how to create a model in the information mart corresponding to a specific Link-Sat combination. Creating a fact table might be an obvious answer, but the satellite hanging off the link stores history, hence want to make sure that this the correct approach.
Explaining through an example
There is a Hub and Sat Product which consists of the base attributes of the product
There is a Hub and Sat Banner with the details of the retail banners under which these products are sold
Then there are banner specific attributes of a product like the website specific description, website URL etc, that I have put in a satellite hanging off a link between product and banner.
Now these attributes are non-additive and can change over time as shown in the example for max purchase quantity.
Now if you look at the second screenshot for the information mart layer
It shows an illustration of Hub & Sat Product being converted into dim_product
Hub & Sat Banner being converted into dim_banner.
The link between product and banner and the corresponding satellite converted into a factless fact table with product_key and banner_key
My concerns are the following:-
This fact table will need to store records as a SCD type 2 to reflect the changes captured in the satellite.
This fact table will contain all attributes from the satellite making it quite wide
Do you think this is the right approach?
I don’t want to manufacture a dimension which is a combination of product and banner.
Apologies, maybe I am unable to explain my question.
The satellite hanging off my link needs to be converted into a SCD type 2.
The grain of this satellite is a combination of keys coming from 3 different hubs which forms the link.
These attributes can’t be aggregated, like I have shown in the example.
How do I convert this Link + Sat into a fact and dimension?
There is no natural key for this dimension as it is a combination of 3 concepts coming together.
Note: These 3 concepts will have their own separate dimension tables containing their base attributes.
These attributes are over and above that.
Maybe the problem is how I have done the DV layer itself?
From what i can understand, I think you are trying to create a Factless Fact table? Or, is your Fact table tracking the Sales of Products?
I’ll propose an option based on my understanding but perfectly happy for those more in the know to correct me if I’m barking up the wrong tree.
If you are trying to create a Factless Fact table then I think you need to determine a grain at which you want to be able to track the ‘Facts’. For example, do you want to know what the Product, Banner and Product_Banner attribute combinations are for every day, or, at the end of a month?
I think you need to determine your grain and then utilise the approach of a creating a PIT table to assist in the creation of your Fact table. Keep the attribute information in the Satellites / Dimensions (all 3 of them). Your Fact table should then include a Date column (to denote your grain) and then Dimension Key columns for each date (row), which link back to your Dimensions to retrieve the correct attribute information for that particular date.
Your Dimensions are created from your Satellites. You will need to create a DimensionKey to denote row uniqueness. You cannot use the HUB Key as this will not denote row uniqueness.
You cannot use your DiffHashKey as this could also not be unique if attributes revert back to a historical combination of attributes. You may have a sequence number column you can use. Or, in my case I created another HashKey value by combining the Hub_Key and the LoadDate and then utilised this as my DimKey.
Your Fact table is now no longer an SCD2 as the attributes exist in the Dimensions (which are created from your Satellites). But, you need to determine a grain that meets your need for the Factless Fact table.
Alternatively, if your Fact table is tracking the sale of Products then your Grain is the date of sale but you still need to include DimensionKeys for each of the 3 Dimensions to return the correct attribute information for the date of sale.
I hope I have helped in someway, even if to generate a clearer discussion
Thanks Carl, yes I am trying to create a factless fact table, however the I can’t move the attributes into individual dimensions, since these attributes exist against a combination of Product and Banner.
Like “maximum quantity” you can purchase is not just a banner or product attribute.
You can purchase 6 bottles of Cat amongst pigeons from Liquorland
You can purchase 8 bottles of Cat amongst pigeons from Vintage Cellars
You can purchase 3 bottles of Jacob’s Creek from Vintage Cellars
As you can see this value differs for the combination of Product and Banner
Also this maximum quantity can change over time and hence the need of storing SCD2 data,
So to answer your question I want the product-banner attributes to be tracked whenever they get changed, similar to tracking changes to only product attributes or only banner attributes.
Product-Banner in itself maybe qualified to become a ‘manufactured’ dimension, if not a factless fact, but I am not sure if that’s the right thing to do - hence the original question.
It looks like your sat_product_banner is a Link Sat off of link_product_banner. So, can you not treat sat_product_banner as any other Sat and track your attribute changes over time? Your ‘Product Banner HK’, from the link, acts as a sort of HUB_Key but then you create a Hash Diff Key to track changes in your SAT. If needed as a Dimension later you can just use your sat_product_banner SAT as a source for your dimension and create a DimKey by either creating a hashkey of ‘Product Banner HK’ and ‘LoadDate’ or by using a sequence number if you have one.
Hey Carl, “It looks like your sat_product_banner is a Link Sat off of link_product_banner”
Correct, if you look at my original post, that’s exactly what I have shown in the diagram
My problem isn’t tracking the changes in DV layer, that’s very straightforward.
My question always was regarding converting this structure into either a factless fact or a dimension table, and both of them don’t seem to be an ideal approach
With a dimension approach - I am creating a new business concept of Banner-Product which doesn’t really exist in real life, even though there are attributes against it.
With the factless fact approach - I need to create a really wide fact table with SCD2 type data,
My question is which is the better approach of the two or is there a better way of handling such a scenario.
Unfortunately this type of scenario is not covered in Dan’s book or the links that Patrick shared.