Delivery
10 min
Data mesh is a technique for developing a decentralized data architecture through the use of a domain-oriented and self-service design. Centralized data management systems have somehow become the de facto method for data analysis.
Moving data to a target system like Snowflake, Amazon Redshift, or Azure Data Lake Storage seems to be the only way for us to analyze all the data that flows in and out of an organization.
Now, many successful data-driven organizations are adopting a different approach to data management architecture.
Before we take a deep dive into dash mesh, let's look at the most popular way to analyze data right now: by transporting it to a data warehouse or lake.
How you get data to a target system depends on the type of data you have, how much data you have, and other factors:
Whatever option you choose, you need to move big data to and from a centralized data repository. That might involve coding, metadata, complex data pipelines, data engineers, and data scientists.
Unfortunately, relying on this centralized approach to data management throws up multiple problems.
We should understand Data mesh as a set of principles for designing modern data architectures. It is a type of data platform architecture that supports distributed functions when handling the data.
Data mesh lets us query data without first moving it to a lake or warehouse. It relies on a decentralized data management strategy, allowing businesses to operationalize and analyze data from within a data source.
Instead of moving data to a separate target system, business teams (or "domain owners" or “domains”) are in charge of data management and ownership, removing many of the issues associated with monolithic data architectures.
Data mesh uses a domain-oriented, self-serve design to embrace the omnipresence of data in the company.
Think decentralization, not centralization. It's an entirely new way to access and analyze data.
Data mesh is the brainchild of Zhamak Dehghani from ThoughtWorks. She believes business teams should be responsible for data within their organizations, and data is a "product" these teams can share with others. The idea is similar to microservices, but it applies to business domains.
Though, conceptually Data Mesh is very similar with microservices they differ in their level of development. Microservices are in mature state of development and Data Mesh it is relatively a new idea for many data engineers.
Note: You can still use a lake or warehouse as part of your data mesh strategy, but these systems are complementary and by no means necessary.
There are four main pillars or principles of data mesh:
These four principles eliminate many of the pain points associated with centralized data management. Data mesh makes data more available and secure for business teams but still connects distributed or siloed data in multiple locations.
We know the data mesh architecture is a decentralized one, but what does this mean in practice? Data mesh gives the responsibility of a data domain (and its operational and analytical data) to the relevant business team.
The team then uses operational data to build data products and analytical data models.
Business data teams decide on all the security, connectivity, and interoperability standards for data mesh projects, enabling faster data analysis.
Engineers will still need to design the architecture for their respective organizations, but data mesh can improve cross-domain data analysis and interconnect data in a way that's similar to APIs in microservices.
There's nothing 'wrong' with data warehousing or 'laking' because both of these methods can lead to incredible analytics. However, there are always challenges involved with these processes, and we've experienced plenty of them. Data mesh can eliminate these issues.
Centralized data repos post long lists of security features on their respective websites like intelligent user access controls, encryption techniques, and data categorization. However, just because a platform is secure, moving sensitive data between locations isn't always legal.
The truth is, data governance legislation is becoming incredibly strict, forbidding companies from transferring new data between sources and a centralized target system unless it's absolutely necessary to fulfill your objectives.
Take GDPR, for example, which influences organizations that manage data from people in the European Union and the United Kingdom.
Failing to adhere to GDPR's rigid data processing guidelines could result in fines of 20 million Euros or 4 percent of global turnover from the previous year, whichever is greater.
There's additional legislation for companies with customers in California (CCPA) and those in the medical sector (HIPAA.) Abiding by all of these guidelines is time-consuming and can negatively impact the data analysis lifecycle.
Data mesh solves this problem because there's no need to transfer data to an external lake or warehouse, which can improve data sovereignty and residency.
The decentralized nature of data mesh enables direct access to data within a data source. Data governance is now the responsibility of the business teams who understand their data more than any third-party system.
Under the centralized data management model, organizations need to make changes to data pipelines as they scale their operations.
As data sources and data sets increase, it can become increasingly difficult to manage these complex pipelines and generate the kind of analytics businesses demand.
Data mesh solves this problem by giving ownership to business teams. Its architecture can reduce the lag between capturing data and analyzing that data, improving latency and real-time business intelligence activities.
As a result, teams can enhance scalability without impacting operational performance, data quality, or automation.
Centralized target systems can be costly and often come with vendor lock-in. Take Amazon Redshift, for example. It uses a pay-as-you-go model, where you pay for all the data you transfer to its warehouse.
That can work out expensive over time, especially if you have large data sets from CRMS, ERPS, relational databases, transactional databases, SaaS tools, and other data platforms.
Data mesh solves this problem by keeping enterprise data within its data source without the need to transfer it elsewhere.
That not only solves any potential data transformation issues — 70-80 percent — of all digital transformations fail — but still allows you to generate data about business activities like you would in a lake or warehouse, improving decision making and performance.
Data teams can capture data events from their existing systems, even if data exists in silos, and deliver insights from self-service pipelines.
Companies such as Zalando, Netflix, Intuit, VistaPrint, and others have been implementing the data meshes.
If your centralized data management ecosystem isn't providing you with the results you need, consider data mesh.
It's a new approach to operationalizing and analyzing data that doesn't rely on a data lake or warehouse for analytics, improving data ownership.
By incorporating data mesh into your organization, you can improve data governance and scalability and get more value from distributed data.
If you want to learn more about how data mesh can add value to your business, Adservio can help. We have supported organizations that want to improve observability, connectivity, and scalability in distributed data environments.
Contact one of our professionals today to learn how we can solve your data management problems.