Data mesh is a decentralized approach to data management where data is treated as a product and owned by cross-functional teams, promoting data accessibility, scalability, and quality across an organization through domain-oriented architecture and self-serve data infrastructure.
This guide will break down data mesh architecture, how it works, and what it means for your business operations. By the time you're done reading, you'll have the tools to scale your business and remain agile in the face of change.
Origin of Data Mesh
Zhamak Dhegani invented the term data mesh back in 2019 as a revolutionary way of managing a company's crucial data.
If you've ever heard the phrase "don't put all your eggs in one basket," you already understand the logic of data mesh architecture. A data mesh refers to a decentralized and widely distributed approach to data ownership.
Core Principles of Data Mesh
While your data mesh architecture may have unique touches, the central principles are the same. A data mesh is a practical approach to data that makes sure no single entity has too much control or responsibility.
Below are the core principles of a data mesh and how they relate to similar tools such as the data lake or data fabric.
Domain-Oriented Decentralized Data Ownership and Architecture
It's important to define what a domain means in the context of a data mesh architecture. In this case, domain refers to any subset or environment of a business entity, which can include employees, suppliers, products, and customers.
Domain-oriented data ownership means no single entity has all the control over how data is stored, distributed, or accessed. The benefits of decentralized data management includes:
- Data consumers receiving access to data products directly from data owners instead of having to sift through several middlemen to get what they need
- Reduction of bottlenecking and data pipelines to ensure more seamless communication across multiple entities
- Prevention of data silos between different domains, avoiding data assets from being gated and locked off from the people who need it
Treating Data as a Product
Treating data as a product is another powerful result of the data mesh, allowing businesses to get the most value out of their assets.
While the data being shuffled around is not technically B2C -- it's going to other employees of the business -- a product mindset is essential for keeping the system working smoothly. Workers also need a commitment to seamless communication and consistent organization to do their jobs well.
A few of the traits you need to ensure you're treating data as a product include:
- Making it easy to discover data in a centralized data catalog
- Consistent naming conventions within the organization to avoid confusion or wasted time
- Quality control features such as vetting data after basic verification methods
Defining Self-Serve Data Infrastructure
Let's dig a little deeper into the data mesh and learn about how a decentralized data source actually works in practice. A self-serve data infrastructure ensures each domain carries a certain level of responsibility in maintaining a data resource.
No matter the business domain, everyone has their role to play when it comes to filtering, cleaning, and loading its data. For example, dividing up this responsibility can look like giving data engineers the ability to manage data tech, while data analysts label and organize the data later. If your team is smaller, you may have more responsibilities on the shoulders of fewer people. A few of the tools and platforms you can use for decentralized data management are decentralized storage, encryption, and blockchains.
Breaking Down Federated Data Governance
Last but certainly not least, data mesh requires a high level of security to run properly. With so many domains all chipping in, everyone has to make sure they're committing to best practices to keep data usage safe.
Each domain has the ability to provide unique standards and implementation depending on their needs. For example, one team may not have the ability to rename data, while another team may not be able to delete duplicates without prior approval.
No matter the type of governance, some form of governance will be needed, including consistently implemented standards, policies, and practices, as well as analyzing how your data product will be used and by whom.
The Benefits of Data Mesh
Now that you understand what a data mesh is and what it's used for, it's time to break down the financial benefits for your business. To create data products isn't enough -- you need to keep scalability, agility, and quality in mind.
Scalability
Scaling isn't easy. According to McKinsey, only 22% of businesses in the past ten years did it successfully. A data mesh gives you the ability to grow your business reliably without overhauling your budget.
Since the data mesh doesn't rely on a centralized data platform, responsibility is more evenly distributed. One of the major benefits of this distribution is the ability to innovate and redesign on-the-go, allowing data consumers to approach data in new ways.
The independent governance of a self-serve data platform offers a certain level of freedom not found in a central data structure. Even if one team is struggling with limitations or recent operational changes, other team members can continue moving relatively unhindered. This agility is another benefit you'll see in the next section.
Agility
When you create data products, you have to consider how that data will be downloaded or redistributed. A data mesh allows separate domains to approach data more quickly according to their best practices, reducing backlogs or wait times.
From querying to discovering, your domains are allowed to be more agile in their role and complete their tasks more efficiently. That doesn't mean anyone can do whatever they want -- there are still best practices and business limitations -- but there are certainly fewer roadblocks.
This agility leads to significant business innovation and market responsiveness. No matter where the industry is headed, you can trust your self serve data platform to rise to the challenge.
Improved Data Quality
While the data lake is composed of raw data that hasn't been organized or filtered yet, the data mesh inherently requires more data quality. Since you're treating data as data products, you hold it to the same standard of what you'd deliver to a customer.
So what does data quality look like in practice? Data quality can look like providing domain teams with data that's been properly analyzed and scrubbed of any faults, such as corrupted files or duplicate files. It can also look like giving unstructured data better organization so that people can more easily find it.
Improving data quality ensures your data consumers are able to do their jobs more efficiently, leading to a positive ripple effect throughout the business. Overlooking data quality in a data mesh can risk confusion, wasted storage, or data silos.
Enhanced Collaboration
From your data engineers to your central data team, everyone needs to be able to work together efficiently. A data mesh enhances collaboration across domain teams by giving everyone concrete tasks toward gathering, analyzing, and using data.
Since a data mesh requires ongoing maintenance to ensure the data is functional enough to use, collaboration is a key feature. All domain teams need to be in regular contact to ensure data products are maintaining a consistent level of quality for your business functions. Overall, a data mesh leads to improved cross-functional insights and data-driven decision-making.
Data Mesh vs. Other Data Architectures
The data mesh isn't the only architecture you can use for your business. Below are a few variations you should consider as you put together more secure and efficient data operations.
Data Mesh vs. Data Warehouses
At a glance, a data mesh and a data warehouse can look similar due to both dealing with large amounts of data products. However, a data warehouse is a more centralized approach, while a data mesh is decentralized.
A data warehouse is highly appealing since it simplifies how a business approaches data, consolidating everything into a single repository. This approach can be useful for smaller businesses who aren't sure if they want the size and scale of a data mesh yet. However, the downside of a data warehouse is how difficult it is to scale. It's also more limited in its functionality and isn't as agile as a data mesh.
A data mesh offers a decentralized approach where multiple domain teams take responsibility over how data is stored, categorized, distributed, and used.
Data Mesh vs. Data Lakes
The data lake and data mesh exist on almost opposite ends of the spectrum. A data lake offers a repository of raw, unstructured data, while a data mesh requires a higher level of organization.
Does that mean one is automatically better than the other? Not entirely. A data lake is very useful for smaller businesses who need to gather large amounts of data as quickly as possible. Its low barrier to entry and agile foundation makes it a useful tool for growing businesses.
That said, a data lake still has limited functionality. Since its data is raw, issues such as corrupted files, duplicate files, and disorganized files will quickly make themselves known.
Data Mesh vs. Data Fabrics
Last but not least, we have the data fabric to take into account. While data mesh uses a decentralized foundation for distributing and using data, a data fabric requires a central data approach.
Data fabric not only requires a central data structure, it's much more automated than a data mesh. Data fabric architecture requires little oversight to collect data from multiple sources in one simple location for people to use. This hyper-automated approach can be very appealing to businesses who have a specific way of working and need to save as much time as possible.
Are You Ready for a Data Mesh? Key Questions To Ask
Now that you understand how a data mesh works and how it benefits your business, you may be wondering if you should implement it. Before you do, consider these key questions to get a better view of how you could benefit.
Organizational Scale and Complexity
The first question you should ask yourself concerning a data mesh is about your business size and complexity. Is your organization large and complex enough to benefit from a decentralized approach?
A few more questions you should ask are:
- Are you experiencing any scalability issues with your current data architecture?
- Do you have multiple domain teams that could use better cross-collaboration?
- Do any of your business plans for the next few years involve expanding your organization?
Data Management Challenges
Data management is a complex issue ranging from security issues to proper organization. Most organizations see data management as vital for success, so you can't afford not to ask the following questions.
- Are there any data bottlenecks, silos, or quality issues hindering your operations?
- Do you need improved scalability and agility in managing your data?
- Do you want a centralized approach or a decentralized approach?
Employees' Domain Expertise
A data mesh is only as good as the data consumers using it. If your domain expertise needs honing, a data mesh may be a little too much commitment.
Ask the following questions about your data platform team make-up to see if you should make the switch:
- Do your teams possess strong domain-specific knowledge?
- How much variety do you have in domain-specific knowledge?
- Do you think your teams are ready to take ownership of their data as products?
Even if you answer no to some of these questions, that doesn't mean you can't still craft a data mesh. Just make sure you don't move forward without addressing these issues, since a lack of readiness will absolutely become a problem later.
Cultural Readiness
A data mesh is just as much a philosophy as it is a data management system. Implementing one requires a level of commitment, collaboration, and determination to succeed.
- Is your organization’s culture aligned with decentralized principles?
- Are your teams willing to embrace a cultural shift towards data ownership and collaboration?
- Are your teams responsive and proactive when managing or distributing data?
Your Resource Availability
A data mesh requires more oversight than a data lake. There's no need to jump into a data mesh if you don't think you'll have the resource to maintain them.
- Do you have the resources to invest in a self-serve infrastructure or governance frameworks?
- Is your organization committed to providing ongoing support and improvement for improved data management?
- Do you know which resources you want to use to create a data mesh structure?
Implementing a Data Mesh Effectively
If you've answered the above questions about a data mesh and want to implement one, it's time to look at implementation. While creating a self serve data platform can seem daunting due to its scale, it can be narrowed down into steps.
Assessment and Planning
Your first step is to treat your data as data products. It's a perspective shift that's part of the assessment and planning process of changing your organization's structure.
Evaluating organizational readiness can involve identifying key domains as well as stakeholders. What are you trying to achieve with your business and how is disorganized data keeping you from those goals?
Establishing Domain Teams
Your domain teams need to have defined roles and responsibilities when establishing a data mesh. One team may be in charge of gathering the data, while others may be responsible for analyzing it to make business decisions.
Training and onboarding your domain teams is vital for creating a self-serve data platform that functions smoothly. Well-planned training programs increase employee engagement. When you consider how a data mesh is a user-focused approach, it's in your best interest to keep said users invested.
Building A Self-Serve Data Infrastructure
Once you have a better idea on how your domain teams will function and your overall business goals, it's time to build a self serve data infrastructure. This stage is where you start selecting different tools and platforms to help you manage your domain data.
You should prioritize tools that allow you scalability and flexibility. For example, a cloud storage solution that allows you to expand as needed or provides deeper insights into your domain data. You can also look toward a security service that provides ongoing analysis of sensitive activity.
Governance and Compliance
Your data products need consistent governance and compliance to ensure best practices across the board. The last thing you want are the wrong people accessing your domain data or mishandling it.
Developing governance frameworks involves creating a set of standards for each domain team. For example, you can provide certain accesses and permissions depending on a person's team role. Taking the time to establish policies for data quality, security, and interoperability will ensure your domain data remains safe and usable.
Iterative Implementation and Improvement
As you figure out the function of your data mesh, you don't have to go all out with the implementation. It's smart to start off with pilot projects as you get your feet wet with data products and the new expectations that come with them.
Starting with pilot projects allows you to gather feedback and continuously improve. You may find you actually wanted a central data team or realize your data scientists need their own domain team. While you can absolutely learn from other businesses and how they approach data, some knowledge only comes from trial-and-error.
Why InterSystems Is the Best Choice for Creating a Data Mesh
Putting together a data mesh doesn't have to be a solo project. Indeed, the data mesh is inherently designed to be a collaborative effort that transforms your data products through a humanistic approach.
We provide you with the means to access data and utilize it more effectively with comprehensive data solutions. We provide advanced capabilities in data management and integration to build scalable, reliable data infrastructures. Our cloud-first data platform provides you with the ability to access data conveniently, safely, and consistently.
InterSystems’ advanced data platforms, including support for decentralized data management, facilitate the creation and maintenance of a self-serve data infrastructure. Over the years we've helped businesses such as healthcare facilities, shipping companies, and investment banks manage and organize their data.
Chess Logistics Technology
Chadwicks Group, Murata Machinery, and Chess Logistics Technology are a few past clients who we helped with data siloes and data-driven decisions. Whether you're concerned about the viability of your data products or want to upgrade your data lakes, we're here to help.
InterSystems is dedicated to continuous improvement and staying at the forefront of data technology. We'll provide your business with ongoing support and collaboration to ensure the success of your data mesh initiatives.
Contact InterSystems when you're ready to craft a data mesh paradigm.