Optimizing data storage

The last five years have been about IT efficiency — from an operations standpoint, and also in terms of making our “stuff” more efficient.

We’ve made storage, networks and servers more efficient by virtualizing them. But now it’s time to stop concerning ourselves with making gear more efficient and instead focus on data efficiency.

After all, who cares about gear, other than gear makers? It’s all about the data. It is time to further leverage our commodity hardware by systematically adding smart services on top of it so we can more readily focus on extracting as much value from our data as is possible.

Data efficiency is just as it sounds — making the data we need more efficient to access, use and manage. That lets us drive more value out of it — which, I would argue, is the entire raison d’√Ętre for IT.

In the storage world, data deduplication has been a hot efficiency “enabler,” along with thin provisioning, snapshots, virtualization, multi-tenancy and data compression. Some of them are new, and some have been around forever, it seems. All of them are important.

But when it comes to making data more efficient, it’s important to consider the “why” and not just the “how.” For example, most data deduplication solutions are designed for backup, not primary data environments. I’m all for making data backups more efficient, but that only represents a small fraction of the value potential for IT.

We have done a decent job, over the last five or so years, at making the systems that store and manage data far more efficient. We can thin provision (virtualize) physical storage assets so that we get the most use out of them. We virtualize data (from a presentation perspective) with the use of snapshots. With multi-tenancy, we can optimize the utilization of our physical assets across multiple constituents. All that is good, but new technologies exist that will allow us to take this much further.

Data compression comes of age
Data compression has been around for a long time, but this is one technology that is currently enjoying a renaissance period. Primary data compression is going to change the fundamental efficiency and overall value proposition that users derive. That’s because you get more value when you create efficiencies closer to the point of data “creation.” Think of it this way: If you start with 100 GB of primary data, over time you will back it up x times, so you’ll end up with 100 GB of primary data, and 100 GBx of backup, or secondary, data. Backup deduplication players such as EMC Data Domain spend their time on the 100x problem — and this is a good problem to spend time on. There are probably a lot of other uses/duplicates of the originating data throughout the organization between creation and backup — like in test/development, data warehouses, etc.

Optimizing data as early as possible is the key. From that point on, all the downstream benefits are magnified. There’s less to move, less to manage, less to back up, less to copy, less to replicate, less to store, and less to break. Less is the new more.