Good day everyone!
Just a quick question regarding the use of Applied Date in the context of indexing or clustering. Is the recommendation on Snowflake to cluster the Satellite table by the parent hub/link Hashkey and Applied Date? In the default DV2.0 pattern the Load Date would have been used instead. Both Applied Date and Load Date are part of my standard Satellite table pattern.
cheers,
John
Hey John,
Clustering on hash key often is a recipe for bad performance from what I’ve seen in my experimenting, It’s not that common that a mart will only want to report on some of an entity so the partition pruning isn’t that effective.
Even when that’s your exact use case, the nature of hashes means that the subset you want are spread uniformly through your table so you end up picking up most of the partitions anyways.
IMHO it’s only really useful in the case when you’re tracking a handful of bks but even then it wouldn’t be worth it reduces the performance of that satellite for every other use case so the trade off is hardly considered worth it.
Snowflake does a pretty good job of clustering by load date timestamp automatically. Not sure if this changes in the context of your applied date field but if you were going to cluster on anything I’d pick that.
All the best!
Frankie
2 Likes
here’s everything you need to know
1 Like
Thanks @patrickcuba and @Frankie - interesting comments that i could just use the defaults or deploy a “C-PIT” table as I note from Patrick’s example.
Regarding the C-PIT - i see how the use of a dynamic table can simplify keeping it fresh.