what’s the most popular approach to naming satellites when there are frequent schema changes in the source ?
I am thinking about the scenario where have a sat_customer table coming from the CRM, then new columns are added to the source table.
is it better to rebuild the sat_customer table with the new columns along with its hashdiffs ?
or create a separate satellite that has the new columns. And in that latter case what would you name that additional Satellite table if the new columns do not have a common business theme ? something like sat_customer_2 ?
thanks !
I guess this works well for most cases.
I would still challenge the approach a little in the case where the source schema changes frequently and the satellite table is a certain size. wouldn’t rebuilding it each time be challenged by some people in charge of the platform ?
what would you argue against anchor modeling apart from the large number of tables and joins ? the current topic for instance is irrelevant as every table has one attribute and no refactoring is required
Try telling your data administrator that every column is a table and now you need to join dozens of tables to return a result, let alone trying to ensure consistency between business entity state!
It’s a nice academic exercise, but at least I have never seen it used at production scale.
I guess it depends… massive historization of millions to billions rows, vs SQL server-like workloads.
If they’re using Anchor for the former I’d like to find out the cost of running those join queries! (an academic exercise on its own!).
If it’s for the latter then I guess the cost of the model will not outweigh the value it produces.
Of course – I’m not always right! I have seen customers/prospects even choosing data vault because “they want to do data vault.” Never a good reason to adopt it and they subsequently tell they’re friends not to do data vault
got you !
what you said is exactly happening with a customer I am talking to right now. they went for data vault a year ago because they wanted data vault. Today it got out of hand and now they are asking what they are doing wrong.
regarding anchor,
there will be indeed far more joins to perform to consume data from the “anchor” model but with careful SQL, couldn’t those joins be performed on already pre-fitered data (usually benefiting from partition pruning) ?
so big tables wouldn’t be fully scanned.
and other tables storing attributes that are not changing that frequently would be broadcast before the joins
I clearly can see the nightmare trying to put together queries with hundreds of tables to create an information mart but I still cannot see clearly that query performance would be an issue. I might be completely missing something !