At the turn of the century, shared-nothing storage architectures were instrumental in helping organizations scale beyond what was then possible with scale-up storage—and in helping them manage the explosion of data, as well as the need for simple and scalable storage capacity.
These systems were the perfect answer to the limited dual-controller, small-volume storage systems that came before them—and helped customers achieve breakthrough levels of speed, resilience and scale.
The impact of the adoption of shared-nothing storage was nothing less than tectonic. Its influence can be felt by many popular storage technologies deployed today, including:
- Much of the world’s web and cloud storage infrastructure such as AWS and Dropbox
- File systems—products like Dell EMC Isilon and Pure Storage’s FlashBlade
- Object storage—solutions such as CEPH and IBM Cleversafe
- Big data architectures such as Apache Hadoop and Splunk
- In virtualization, with products such as Nutanix and vSAN
- And even in modern backup appliances such as Rubrik and Commvault Hyperscale
For 20 years, billions of dollars of infrastructure have been deployed as shared-nothing clusters. While these products have succeeded, their success has simultaneously cast a light on the problems they have not solved. With each exabyte deployed, the initial objectives of shared-nothing storage cluster architectures become even more confused.
In this article, we will take a high-level look at each objective’s premise—and their actual reality:
Performance Scalability
The Premise
Shared-nothing clusters are easy to scale performance by just adding more nodes (CPUs + disk) into a cluster.
The Reality
Because each cluster node has to coordinate data and metadata activities with other cluster nodes, the cross-talk relating to cluster coherency and storage rebuilds has limited the effective performance scale to only a few dozen nodes. As systems grow larger, so does the chatter within the cluster. Most commercial scale-out storage appliances don’t scale beyond a few dozen nodes before the law of diminishing performance returns limits customers from seeing linear performance gains.
Capacity Scalability
The Premise
Shared-nothing products eliminated the islands of storage created by dual-controller architectures by aggregating storage across nodes, making it simple to add capacity by adding more nodes.
The Reality
The tight coupling of shared-nothing storage to a node’s CPUs, and shared-nothing’s focus on HDDs, has resulted in pools and tiers of infrastructure that are not globally scalable. Data becomes constrained within a tier and volumes, directories and data management operations typically never occur across these pools. As a result, there is still data segmentation and inefficiency within shared-nothing systems, even though they were developed to eliminate islands of infrastructure.
The existence of storage tiering in classic shared-nothing architectures still forces customers to wrestle with the capacity and performance-sizing problems these systems were designed to solve.
Multi-Tenant Cloud Storage Infrastructure
The Premise
Shared-nothing storage systems are designed with concepts that are in use by the world’s leading cloud services, and are designed to host the requirements of many applications at the same time.
The Reality
Because shared-nothing systems broadcast the I/O requests made to any node across multiple nodes within a cluster, powerful applications can—and do—inflict significant pain on multi-tenant environments. For this reason, many organizations will deploy different shared-nothing clusters for different users or different purposes—thereby creating the islands of storage that these systems were designed to eliminate.
“Global” Storage Logic
The Premise
“One.” One namespace, one simple management experience, one set of algorithms that scale across the cluster. Everything is simple, scalable and efficient.
The Reality:
The tight coupling of CPUs to disks has resulted in a limit in the scale of storage algorithms that have—at their core—limited these products from being able to appropriately address the needs of modern applications:
- No system was built to globally buffer writes and manage wear across thousands of low-endurance, high-capacity SSDs—forcing vendors to use expensive flash
- Erasure codes stripes don’t scale beyond 10-20 data blocks—resulting in overhead that is often measured at 20-40% of the cumulative cluster capacity
- No shared-nothing system was ever built with a global data reduction dictionary because they required a tight coupling of data reduction indexes to the DRAM of a node. Storage vendors have had to wrestle with scale, because each copy of the index across cluster nodes becomes prohibitively expensive.
The resulting outcome: shared-nothing flash solutions are too expensive for broad adoption and still cost 20 times the cost of HDD storage, forcing customers to continue to trade performance versus capacity.
Simplicity
The Premise
Scaling is easy just by adding nodes of CPU and disk—both performance and capacity grow linearly, so you don’t need to choose one or the other.
The Reality
By tightly coupling CPUs, RAM and disks in a node architecture, shared-nothing clusters have created a Cambrian explosion of node types as customers are forced to balance against a tradeoff between IOPS, throughput and capacity needs. Most shared-nothing products feature 5-10 different node options, forcing customers to make hard-coded and compromised decisions on how they’d like to size performance and capacity. Then, they must hope that they don’t need more performance for their capacity in the future, because the capacity they purchase becomes hostage to a specific CPU.
This problem can result in, at times, comical outcomes. The most famous example of the complexity of shared-nothing node choices resulted in a 46,763 page price list.
The promise of shared-nothing has been delivered, and the world has benefited. At the same time, we now have no illusions about what these systems can and cannot do.
Disaggregated and Shared Everything (DASE) Storage
The emergence of new foundational storage technologies such as NVMe-over-Fabrics, Storage Class Memory and low-endurance, low-cost flash media has made it possible to reimagine what storage can be in the modern era. Now, the next-generation of disaggregated and shared-everything (DASE) storage is breaking many of the tradeoffs and compromises that have emerged through the global adoption of shared-nothing architectures.
The table below highlights DASE benefits and summarizes the key architectural differences:
Shared-Nothing Storage |
DASE Storage |
|
Performance Scalability |
Cluster cross-talk limits the practical performance, and performance scaling starts to tail off after a few dozen nodes. |
Every CPU independently mounts each piece of media in a cluster. DASE systems eliminate east-west cluster traffic to deliver linear scale. |
Capacity Scalability |
Volumes of data are locked into specific pools of HW so that each have their own data management and data protection boundaries. |
Asymmetric cluster architecture breaks down the barriers between multiple generations of storage and pools them into one large resource group— eliminating data boundaries and preserving the balance of capacity and performance as you scale. |
Multi-Tenancy |
Customers are still building separate clusters to meet the diverse and competing performance needs of applications. |
With the elimination of east-west traffic, it then becomes easy to segment the CPUs of the cluster into dynamically scalable pools that can be allocated to demanding applications and tenants; isolate traffic between competing applications; and still serve all of them from one global storage system. |
Global Logic |
Shared-nothing systems are still using limited logic around managing volumes of data, narrow data protection stripes and limited data reduction. The result is often not much different than what’s achievable with dual-controller designs. |
Exabyte-scale DASE architecture features global QLC flash management, global data protection and global data reduction that combine to help dramatically recalibrate the effective acquisition costs of flash—making it possible to afford flash for all of your data and to eliminate the HDD from the data center. |
Simplicity |
Customers are struggling with a variety of different storage node makes and models that they can select from, to build out their storage clusters. With SSD options that cost 10-20 times what their HDD tiers cost, tiered storage systems further compound the complexity of how to size and scale storage—leaving HDD-based data on slow, legacy media. |
CPUs can be scaled independently of capacity, resulting in only one server model and one enclosure model per each generation of HW, where customers can right-size infrastructure to their requirements and add performance without adding more storage capacity. Furthermore, when it’s possible to afford flash for all of your data, storage no longer needs to be sized for performance or capacity. |
Talking to Storage Vendors
When it comes to global discourse on the future of storage, it’s important that your storage vendor understands the fundamental differences between newer and previous generations. Admittedly, storage concepts can often be quite technical, and storage companies can at times make it difficult to see forest from the trees.
Whether you currently have a storage vendor or are seeking a new one, here are a few questions to ask them as the shared-nothing era winds down:
- Have you innovated around flash and delivered a level of storage efficiency that makes it possible to afford the total cost of acquisition of a true all-flash data center?
- Do you still advertise complex tiered storage architectures that force continual assessment of the value of performance versus the cost of capacity?
- Is your current storage architecture able to scale performance and capacity linearly, without introducing any cluster cross-talk that would diminish performance returns at scale?
- Is it possible to safely consolidate high-performance and heterogeneous applications onto a single storage cluster, where the IO from any power-user’s application can be fenced off from every other application?
- Lastly, is scale-out really simple if there are a multitude of node types to configure and combine into a shared-nothing cluster? If you guess wrong and under-size performance, what is the mechanism to correct this decision? What is the cost of over-estimating performance with a vendor’s expensive high-performance flash nodes?
It’s closing time for the shared-nothing era—welcome to this new beginning!
Source: Vast Data