HPE Ezmeral: Uncut
1748219 Members
4480 Online
108759 Solutions
New Article
Ellen_Friedman

New way to tame your data giant: HPE Ezmeral Data Fabric’s distributed file system

Data is the lifeblood of the modern enterprise. To stay competitive, a business must give their analytics and AI applications access to the right data at the right time.

HPE-Ezmeral-Data-Fabric-Flexible-Efficient.jpg

It sounds simple, but providing flexible and fast data access can be challenging – especially for businesses with massive amounts of data from a variety of sources. Add to that challenge the complexity of diverse applications spread across many locations with high rates of change. What, then, can make all this simpler?

The answer is a modern data infrastructure that serves as a unifying data layer across your enterprise between core data centers to cloud and multicloud and to edge. HPE Ezmeral Data Fabric, a key part of the HPE Ezmeral Software Portfolio, is designed to do just that. The data fabric’s distributed file system is engineered for data storage, access, management, and motion – all while maintaining enterprise SLAs for data resilience and enterprise security policies – even up to exabyte scale. Now data fabric’s distributed file system is available as a stand-alone option.

Shown in Figure 1, this software-defined solution is hardware agnostic and provides highly-flexible and fast data access, making it excellent as a foundation for diverse data-driven applications. This design not only meets current needs, but HPE Ezmeral Data Fabric also lets you expand your system seamlessly, without the need to re-architect as your system grows in size and variety. Watch the video interview on “Scaling without Scaling IT” with Ronald Van Loon and Ellen Friedman to find out more about how data infrastructure affects practical considerations for scalability.

Figure 1: HPE Ezmeral Data Fabric software data storage and management solutionFigure 1: HPE Ezmeral Data Fabric software data storage and management solution

As a fully read/write and extremely scalable distributed file system, the data fabric is broadly usable. Read on to see how the powerful capabilities of HPE Ezmeral Data Fabric-- scalability with resilience, multiple data access methods, platform-level management, and platform-level data motion -- can tame your data giant.

Seamless scalability with extreme resilience 

How does HPE Ezmeral Data Fabric’s distributed file system handle exabyte scale – and even billions of files or more – with high resilience and without sacrificing performance? 

Resilience comes in part thanks to robust self-healing. File data in data fabric is, by default, divided into pieces and replicated three times. These replicas are distributed across the cluster for safety. If a disk or machine fails, the system automatically recreates any missing data from the remaining replicas. To users and applications, it appears as though nothing happened, and workloads continue uninterrupted.

Data fabric’s ability to self-heal means you can add capacity at any time. The data fabric will automatically rebalance data to populate new hardware, which helps with scalability and performance.

Extreme scalability with stellar performance also relies on the fact that metadata for files and directories is fully distributed across the cluster. This design helps protect against data loss and avoids traffic jams that would diminish performance at large scale. 

Flexibility in data access methods

To serve as a unifying data infrastructure across an enterprise, HPE Ezmeral Data Fabric provides the flexibility of multiple data access methods. This means modern applications, including AI and machine learning, can directly access data or write to the data fabric without having to copy data for use on specialized systems.

Furthermore, thanks to traditional access methods such as POSIX and NFS, legacy applications can also use the data fabric, sharing data with modern applications, as shown in Figure 2.  In addition, containerized applications orchestrated by Kubernetes can access the data fabric, thanks to the included CSI (Container Storage Interface) driver.

Figure 2. Multi-API capabilities of the HPE Ezmeral Data Fabric file systemFigure 2. Multi-API capabilities of the HPE Ezmeral Data Fabric file systemSharing data and sharing resources not only reduces cost and complexity, it also avoids data silos and encourages collaboration. Furthermore, better data access and the ability to handle huge numbers of files, as well as large amounts of data, can improve performance. The recently published customer case study, “Accelerating Data Insight for a Better Work Life” demonstrates this point and describes how New Work SE solved performance problems and took advantage of multi-tenancy using HPE Ezmeral Data Fabric.

With millions or billions of files being accessed by many applications, it’s especially important to have a convenient way to find data. HPE Ezmeral Data Fabric provides a global namespace that allows applications or users to refer to data via the same pathname. The global namespace lets an application access data stored in an HPE Ezmeral Data Fabric cluster locally or remotely in a data fabric cluster in another location. Remote access could include a cloud deployment of the data fabric (Figure 3).

Figure 3. HPE Ezmeral Data Fabric global namespace for local and remote data accessFigure 3. HPE Ezmeral Data Fabric global namespace for local and remote data accessHaving a global namespace is not merely a convenience; it also makes it easier to work across multiple locations, including a hybrid cloud and on-premises design. 

The data fabric’s self-healing, automatic rebalancing, multi-API access, and global namespace capabilities are just part of how this data infrastructure makes it easier to handle data management at huge scale without overburdening developers or overwhelming IT teams. Let’s look deeper at why platform-level data management makes a big difference. 

Powerful platform-level management 

To make it practical to work efficiently with very large data systems, it is important to manage data at the platform level, not at the application level. HPE Ezmeral Data Fabric offers the data fabric volume – a logical unit in the data fabric – that is like a stretchable bag of data rather than a rigid box of predetermined size. (This is very different from a traditional block storage volume). A data fabric volume grows or shrinks to fit the data it holds. Think of the data fabric volume as being a directory with super powers that lets you easily control data placement on a cluster (shown in Figure 4). 

Figure 4. Data fabric volume (transparent triangle) ¬– an organizational unitFigure 4. Data fabric volume (transparent triangle) ¬– an organizational unit

By default, a data fabric volume spans a cluster. Notice in Figure 4 that all three replicas of a particular storage unit are held in the same volume but are randomly distributed to different machines as a safety measure. You also can use a volume for customized data placement, such as on specialized hardware (represented in Figure 4 as darker blue boxes). 

The data fabric volume is also the basis for making true point-in-time versions of data via snapshots and for data movement between clusters via mirroring (as described in the next section). Furthermore, data fabric gives you fine-grained control over who does and who does not have access to data by setting Access Control Expressions (ACEs) at the level of volumes, files, or directories. 

Data logistics: platform-level data motion

HPE Ezmeral Data Fabric provides efficient data mirroring between geographically separated data fabric clusters or from on-premises or edge to a cloud deployment of the data fabric. This works even with public cloud vendors, such as AWS or Azure, giving you control over your own data using familiar pathnames (again, thanks to the global namespace). 

The first step in mirroring a volume is to take a local snapshot of the volume. This step ensures the mirror is an exact copy of the source data at an exact moment in time. In addition, after the initial mirror copy is set up, updates occur incrementally, making mirroring fast and efficient.

Data fabric volumes provide a powerful way to manage data, including letting you choose what data will stay at an edge cluster or what data will be moved back via mirroring to a data center. At the data center, data from many small edge clusters can be aggregated, analyzed, or used with AI applications, as shown in Figure 5. 

Figure 5. Mirroring between data fabric clustersFigure 5. Mirroring between data fabric clusters

A real world example of the power of mirroring for edge systems with extreme amounts of data is described in the customer case study, “Accelerating Autonomous Car Development with Ready Access to Global Data Fabric”.

Mirroring could also be used to maintain a second cluster at a remote location as part of a data recovery plan. No matter why you are moving data, automatic load balancing capabilities of the data fabric help ensure mirroring will not interfere with the running of business-critical applications.

Next steps

To learn more about how HPE Ezmeral Data Fabric improves large-scale systems:

Related articles:

Want to know the future of technologySign up for weekly insights and resources

 

Ellen Friedman

Hewlett Packard Enterprise

www.hpe.com/containerplatform

www.hpe.com/mlops

www.hpe.com/datafabric

About the Author

Ellen_Friedman

Ellen Friedman is a principal technologist at HPE focused on large-scale data analytics and machine learning. Ellen worked at MapR Technologies for seven years prior to her current role at HPE, where she was a committer for the Apache Drill and Apache Mahout open source projects. She is a co-author of multiple books published by O’Reilly Media, including AI & Analytics in Production, Machine Learning Logistics, and the Practical Machine Learning series.