There seems to be quite a bit of buzz around Data Mesh and the concepts of delivering data products as API driven services. Zhamak Dehghani has been talking about this concept in the wider IT industry and has published content on this, such as this https://www.youtube.com/watch?v=_bmYXWCxF_Q
I would be interested in the general thoughts in this community on Data Mesh and how/if they see Data Vault aligning to this.
I like the modular business domain approach, but wonder about how much of this practical.
Hi @jhall_uk . I am curious about this topic as well. I am looking forward to the following session at WWDVC by Paul Rankin of Roche:
Perhaps @VeronikaD has some insights on this.
From my understanding and exposure to it, it is another “practice” based on focussing on DM’s four pillars by turning what corporations were already doing from a monolithic approach to a decentralised, service-oriented approach. It’s nice to put a label to these patterns and Data Mesh has a nice catchy title to it.
A good article to review is Martin Fowler’s article on it https://martinfowler.com/articles/data-mesh-principles.html which was a follow-up to https://martinfowler.com/articles/data-monolith-to-mesh.html It’s a great explanation from one of the top minds in architectural thinking.
As to the original question Fowler’s view is that the data warehouse “simply” becomes one node in the architecture. It acknowledges that data will need to flow across domains to make the otherwise domain-specific data products useful. e.g. unless you have a single domain built around Customer it’s likely that customer will need to flow almost everywhere to add context.
His final summary is for four Agile-like principles:
serving over ingesting
discovering and using over extracting and loading
Publishing events as streams over flowing data around via centralized pipelines
Ecosystem of data products over centralized data platform
Thanks @saqib for pointing me to that article. @kgraziano (as always) paints a very good picture which mostly answers my questions if the answer is to still use a data warehouse in a data mesh world - the modular approach to building and extending DV2.0 models seems in part to be in line with the concepts of data products - but DV models were never meant to be directly consumed by business users, right? So, a hub, link and satellite are not really data product outputs - the mart objects are (even if they are just views of the same DV2.0 modelled data) - so the DV2.0 model would be buried within the data product logic with the exposed output being dimensionally modelled. Perhaps that is actually the answer…
Where I am still left a little in the dark (and this may just be data mesh blog/video overload on my part) is what I thought was a goal with data mesh to move towards real-time analytics from the source without the need for an expensive, complex ETL and data warehouse. I thought was part of the driving force behind the need for change, according to Zhamak Dehghani. ETLs and data warehouses “don’t work” in her view. I have a real problem with that statement - it’s too much of a generalisation. The art is in the method - poorly defined data warehouses don’t work - for all the reasons DV2.0 practitioners know all about.
It is indeed a matter of degrees and interpretation. The comments against data warehousing are really targerted at what she calls monolithic data warehousing where the IT team does not understand the data and becomes a bottleneck to delivering data for analytical use. You know, the classic get inline scenario that leads to shadow IT and data silos.
Yes, well run DV 2.0 projects genrally do not suffer from this, especially if they are using automation. When I ran real agile dw projects, we had the business SMEs on the teams and did not have the problems she cites.
What she proposed, in my mind, is more organizational in nature with some architectural framework. A good agile data organization works this way already. One of the goals of data mesh is to decentralize the work of modeling the data and building the data pipelines. That means the domain team has to get the skilled workers or borrow them from IT. It also means to owners of the source data become responsible for the entire life cycle from entry thru to use for analysis. That means they are responsible for data quality too! That is a good thing.
But none of it says we need to throw out methodology and architectures that do work. We just decentralized the work and responsibility to the business with IT experts as guides and mentors and probably establishing the standards so it all does work.
Really it is a newish take on federated data warehousing as it should not result in new data silos.
So no, it is not about just exposing raw operstional data, unless someone needs that too. Data products are transformed data that is easily usable by any and all downstream consumers. Some of those may be others teams tasked with organization level views of the data across all departments. In that case the data product for them may best be the raw data vault structures so they can in turn integrate all the data properly to then produce data products (i.e., information marts) for their consumers. One team may produse multiple products from the same set of data according to the needs.
@jhall_uk one thing I’d suggest is for all of us to stop thinking of the final analytical output as the only valid type of data product. It’s certainly a really important category of data product! But in our org we define the sourcing of a new system into Snowflake, or the creation of a new hub and sat as data products as well, and we communicate and celebrate each of them.
The audience for each data product is different but they are no less valuable to the person that consumes them.