The high speed of flash storage often makes it easy to justify its high price. But with Dell’s new approach to flash tiering, that justification may no longer be necessary. Though the latest release of Dell’s Compellent Storage Center includes new hardware goodies such as a very high-density 3.5-inch SAS enclosure and a raft of updates to the array’s management and host integration software, the really big news is support for automated tiering between write-optimized SLC (single-layer cell) SSDs and read-optimized MLC (multilayer cell) SSDs.
The Compellent system could already automate tiering between expensive, low-capacity SLC SSDs and spinning disk. However, this new blend of the two predominant SSD techs allows Dell to claim it can deliver an all-flash solution for the price of disk. Using list price as a comparison, the same money that you might have spent on a Compellent array with 72 146GB 15,000-rpm SAS disks will now buy you a similarly licensed array with six 400GB write-optimized SLC SSDs and six 1.6TB read-optimized MLC SSDs.
Even better, that pure SSD configuration can deliver three times the transactional performance (using a TPC-C benchmark), 85 percent less latency, and 15 percent more capacity while consuming 50 percent less power and rack space than its disk-based counterpart. In other words, if you have the right balance of performance and capacity requirements to effectively leverage it, this innovation could save you a wad of cash, deliver a huge performance windfall, or both.
Dell’s novel approach to tiering does come with a catch. However, understanding this gotcha and its potential effects in the field requires a deeper understanding of SSDs in general, Dell’s Data Progression tiering software in particular, and how Dell has leveraged both in this new release.
A crash course in SSDs
Instead of using mechanical spinning platters to store data magnetically, SSDs use solid-state flash memory to do the job. Although flash memory is used in all kinds of devices from iPods to USB sticks, the kinds you’ll find in enterprise storage are typically either single-level cell SSDs or multilevel cell SSDs. The differences between the two boil down to the typical balancing act among performance, capacity, and expense.
Generally speaking, there are two enemies of solid-state storage. The first, generally referred to as “write endurance,” is that each cell within an SSD can endure a fairly specific number of so-called program-erase cycles before it will no longer be able to store data accurately. Write-endurance figures are generally reflected in “full device writes per day” — a metric that gives a user an idea of the overall lifecycle of a device.
The second enemy of solid-state storage, a phenomenon referred to as the “write cliff,” is associated with the fact that each cell must undergo a (relatively) time-consuming erasure process before it can be written to. If the background process that erases unallocated cells fails to keep up with the write load the device is experiencing, the device will run out of pre-erased cells, and write performance will fall through the floor.
To combat these two problems, both kinds of SSDs — SLC and MLC — are typically equipped with more raw capacity than they advertise. This allows the device to spread out write operations over a larger number of cells, which increases overall device endurance and gives the device more cells to keep empty to absorb large write workloads. This slack capacity and the intelligence built into the SSD’s controller to manage it are what really separate consumer SSDs from those used in enterprise storage devices. (They also explain the capacity differentials you’ll find when you shop the two markets.)
Further, SLC and MLC devices are fundamentally different in that SLC devices store only a single bit per cell while MLC devices store two or more bits per cell. This means that SLCs use fewer transistors per cell as compared to MLCs, but more transistors to store the same amount of data. Thus, SLCs can sustain a much larger write workload (usually 25 to 30 full writes per day versus three per day for MLC), and it will absorb writes three to five times faster, but are also substantially smaller and more expensive than MLCs. However, SLC and MLC SSDs are nearly equal in terms of read performance (with MLC perhaps 2 to 3 percent slower) — a fact that’s crucial to understanding Dell’s approach to flash tiering.
The Compellent’s secret sauce
Even before its acquisition by Dell in 2011, the Compellent system’s main claim to fame was its Data Progression (DP) tiering software. DP’s job is to free up capacity in faster and more expensive tiers of storage by moving data into progressively slower and more economical tiers.
For example, suppose your Compellent array consists of a top tier of fast, expensive 15,000-rpm SAS disks and a bottom tier of much larger, slower, and less expensive 7,200-rpm NL-SAS disks. Unless you configure it not to, the array will split incoming data into pages and write them across the disks in the top tier. Because Compellent arrays implement RAID at the page level, your array can choose which RAID level to use on a per-page basis. Since writing in RAID10 is faster than RAID5 or RAID6 (given that only two write operations are required and no parity must be computed), it will use RAID10.
However, top-tier disk capacity is typically limited and fairly expensive. The array won’t want to leave that storage sitting there for long unless there’s a good reason to. That’s where Data Progression comes into play. At some point every day (7 p.m. is the default) DP will run as a background process on the array, moving data to different tiers and changing the RAID level based on how heavily the data has been used and what policies you have set. DP will even differentiate between the faster outer rim of NL-SAS disks versus the slower inner tracks, creating a sort of tier within a tier (Dell calls this licensed feature FastTrack).
If that block of data you wrote has been written once and not read again since, it might be moved to the bottom tier and restriped using RAID5. If it had been read more frequently, it might be left in the faster, top-tier storage, but still restriped to RAID5, which is just as fast as RAID10 from a read perspective and takes up quite a bit less space. In both cases, these changes are made by a low-priority process that you’d configure to run at a time when the array isn’t under peak demand.
All in all, Data Progression’s job is to give you the read and write performance of the top tier of disk for the data that needs it, while allowing you to leverage the economy of lower tiers of disk for less frequently used data. In situations where the array is sized properly, DP does this exceedingly well.
The Compellent Enterprise Manager will keep tabs on the usage of your two flash tiers — the write-intensive SLC SSDs and read-intensive MLC SSDs.
Having your cake and eating it too
Accomplishing this same feat when tiering between two tiers of SSDs is something of a different animal. Whereas Data Progression runs once a day in spinning-disk configurations, it operates continuously in tiered-flash configurations. In the case of tiered flash, DP is also heavily linked to the array’s snapshotting mechanism.
Like many fully virtualized arrays, Compellent arrays implement snapshots (“replays” in Compellent parlance) at a page level. When you write data into a volume, that data is split up into pages and written to disk. If you create a snapshot, those pages and any pages written before them are marked in a database as being part of that snapshot, but effectively nothing else happens — no data is immediately moved anywhere. Later, if some of the volume is rewritten with new data, that data is split up and written into different pages on the disk; the original pages still exist and are ready to be referenced if the snapshot is ever needed. Once a snapshot is deleted, the pages that comprised it are freed to be overwritten.
In spinning-disk configurations, Data Progression treats pages that are part of a snapshot differently than it treats active data. Because it knows the snapshot data is far less likely to be read from once it has been replaced by newer data in the active volume, it will typically move those pages to a more economical tier during its next 7 p.m. run.
However, in tiered-flash configurations, Data Progression doesn’t wait for 7 p.m. to roll around to make tiering decisions. Instead, immediately upon the creation of a snapshot, Data Progression will punt data out of the top tier that is backed by expensive, write-optimized SLC SSD and write the data into inexpensive, read-optimized MLC SSDs.
The goal of this process is threefold:
Thus, Dell’s approach to flash tiering succeeds in leveraging the best that SLC and MLC devices bring to the table while avoiding the sweeping compromises made by single-tier deployments of “mixed-use SLC” (SLC with less wear-leveling capacity) and “eMLC” (MLC with added wear-leveling capacity). Said another way, it’s much more like having a tiered 15K SAS/7.2K NL-SAS spinning-disk array that can give you the benefit of both types of media versus having a single-tier 10K SAS spinning-disk array that gives you something in between.
Yes, there’s a catch
It’s rare that an engineering decision doesn’t have some kind of drawback. In this case, the catch is found in the creation of those snapshots that are so vital to the tiered-flash model. If data is immediately moved from the write-optimized SLC tier to the read-optimized MLC tier upon creation of a snapshot, there’s an obvious cost to doing that. The load on the SLC tier will increase as data is read out of it and written into the MLC tier, and this can’t help but impact performance on the SLC tier whenever host I/O is driving those SSDs to their limits. Worse yet, pages that are migrated from one tier to the next have to be locked during the operation, and this can cause contention in very high-I/O situations given that committing data to the MLC tier takes three to five times longer than reading it from the SLC tier.
To test the impact of this, I created a worst-case scenario in the lab. I set up a series of volumes and started directing a breakneck read and write load at all of them. In my case, it was a stream of randomized 4K I/Os with a 70/30 mix of reads versus writes (very roughly approximating an OLTP workload). This workload was isolated to a fairly small footprint on the array (about 80GB in total).
Enterprise Manager will also help you keep an eye on the health and wear of the SSDs.
At first, the entry-level “6+6” (SLC+MLC) configuration handled this workload entirely with the SLC tier and clocked in at more than 70,000 IOPS with sub-5ms latencies — truly impressive considering a similarly priced spinning-disk array would be hard-pressed to serve up a third of those IOPS with three times the latency. However, things took a turn for the worse when I created a snapshot that simultaneously impacted all the volumes I was throwing my workload against. The I/O stream came to a screeching halt — immediately dropping to about 3,500 IOPS and slowly crawling back up to its previous speed over a period of a few minutes.
Any storage admins out there who are reading this right now will realize how crippling that could be in a production scenario. Having your storage throughput suddenly drop by 95 percent and your storage system take minutes to recover because you created a snapshot would be very bad indeed (think every phone in the help desk ringing at once). However, good storage admins will also recognize how incredibly unlikely this scenario is in most real-world situations.
The production loads you’ll find out in the field are generally very bursty on a subsecond basis. That is, if you were to create a graph of the duty cycle of a primary storage array in a typical enterprise with a resolution of 10ms or 20ms, you’d see it bounce around all over the place. The array could be very busy, but still have a lot of slack space where no transactions were being executed. It’s in this space where the on-demand portion of Compellent’s Data Progression software gets its work done, and where it can work without impacting host I/O to any great degree.
In my artificial lab test, however, the array was being pushed to its limit — effectively creating a 100 percent duty cycle. This left no room for Data Progression to do its work and created enough congestion between host writes and time-consuming SLC to MLC data migrations to cripple overall performance.
In the real world, if you have a write-heavy workload that requires the full raw performance of SLC flash 24/7, you probably won’t want to leverage Compellent’s Data Progression software at all. Instead, you’ll deploy enough SLC capacity to hold all the data you intend to hit this way and configure the storage policy that applies to it to prevent Data Progression from moving it out of SLC flash. That would neatly sidestep the entire issue while still allowing less brutally assaulted volumes on the same array to take advantage of the economy presented by SLC/MLC tiering.
If your workloads are more typically bursty or heavy only during certain times of day (like the bulk of enterprise workloads out there), you may never notice the impact of snapshot migrations. Or, if you do, you might well avoid it by scheduling snapshots to occur slightly less frequently than you might have otherwise.
The rest of the story
Certainly, tiered flash isn’t the only facet to Compellent or even the Storage Center 6.4 release. Generally speaking, Dell fields a very capable and cost-conscious midrange storage device in the Compellent product line.
Dual active-active SC8000 controllers based upon Dell’s R720 industry-standard servers can be equipped with 4Gbps, 8Gbps, and 16Gbps Fibre Channel; 1Gbps and 10Gbps Ethernet iSCSI; and 10GBps FCoE host connectivity, as well as up to three redundant back-end 6Gbps SAS chains for disk shelf connectivity. Disk shelves come in three shapes and sizes with the 24-slot 2.5-inch SC220, 12-slot 3.5-inch SC200, and brand-new 84-slot 3.5-inch SC280. (The last one is limited to NL-SAS disks and clearly meant for high-density applications.) In the past, drives were generally sold on a per-disk basis, but Dell is starting to move more toward bundles that include a specific number of disks. For example, the SC280 is available in several half and fully populated configurations targeting various capacities.
Naturally, Enterprise Manager also puts performance statistics at your fingertips.
Software licensing add-ons are available for per-array features such as FastTrack, Remote Instant Replay (remote replication), and Live Volume (interarray volume migration). These are available in base licenses that cover 16 active disks (excluding hot spares and such) and add-on packs that will license an additional eight. It’s not hard to see that this is one area in which tiered flash configurations (which feature fewer disks versus 15K configurations) can show a clear savings.
Array management is provided by a mix of the Storage Center software, resident on each set of controllers, and Enterprise Manager, which forms the single pane of glass necessary to orchestrate the operations of multiple arrays. Strictly speaking, while you can live without Enterprise Manager (a separate Windows-based, SQL-driven reporting and management app), it does make simple tasks such as configuring a volume to replicate from one array to another substantially easier. You’ll also need Enterprise Manager if you intend to integrate with third-party software such as Microsoft’s Virtual Machine Manager or VMware’s Site Recovery Manager.
If you already own a Compellent array and are interested in the new flash tiering, you’ll have to wait a little while. Although you can purchase a new three-tier array including write-intensive SLC SSDs, read-intensive MLC SSDs, and 7,200-rpm NL-SAS, you can’t graft the tiered SSDs into an existing array. Dell plans to add that functionality in the next Storage Center release, so stay tuned on that front.
Putting it all together
To sum it all up, I liked the Compellent line before this new release, and I think it has even more potential now. Software features such as Data Progression and Remote Instant Replay coupled with relatively inexpensive, high-performance industry standard hardware combine to deliver a feature-rich and relatively inexpensive primary storage option. The addition of flash tiering and higher-density NL-SAS can only serve to broaden the Compellent’s appeal to a wider audience.
However, no enterprise storage on the market today is perfect. The I/O congestion that occurs during snaphshot creation in Compellent’s tiered flash configurations under high I/O load really is concerning. Not so much because it exists, but because the management software doesn’t expose what is happening when it does.
Certainly, any work Dell can do to eliminate this issue (by tweaking its I/O queuing to further deprioritize tier migrations versus host I/O) would be welcome. However, I think it may actually be more important to do some work on the Enterprise Manager and Storage Center UIs to expose more of what Data Progression is doing. It’s not so bad to have the magic stay behind the curtain when it’s a once-a-day background task, but when it becomes an always-on feature, having second-by-second visibility into what’s happening inside the black box becomes critical.
All that said, I would strongly recommend putting the Compellent Storage Center on your dance card the next time you’re facing a major primary storage upgrade.