The options beside RAID storage for hyperscale

One of the issues with depending on redundant array of independent disks in hyperscale environments is the reduced usable storage versus purchased raw capacity.

It’s been accepted dogma that the best way to protect data on disks is with redundant array of independent disks (RAID). RAID is a storage technology that distributes data across multiple disk drives, so that if a drive were to fail, the data can be recovered from the data on the remaining good drives using additional parity data, an extra bit of information calculated from the stored data that helps in its recovery when lost. Most enterprise storage systems today rely on RAID 6 for protecting data in case of failure. RAID 6-protected systems can survive two drive failures and still recover all the stored data.

One of the issues with depending on RAID in hyperscale environments is the reduced usable storage versus purchased raw capacity. With RAID 6, the usable storage is only 60 percent of raw due to all the extra parity data being stored. This virtually guarantees inefficiencies in data storage. And in today’s big data environments, this means big inefficiency.

As hyperscale storage systems have grown to store multiple petabytes, techniques that are not RAID-based to protect data stored on disks are being used, especially where the data being stored is immutable, unstructured data — the exact type of data driving the lion’s share of data growth in hyperscale systems. Examples of immutable unstructured data types among government agencies include documents, medical records, scientific instrument data, and intelligence, surveillance and reconnaissance data.

The simplest of these new techniques is replication. As data is stored on a disk, it is simply replicated to another disk or disks, usually on a system in a remote location (also known as cloud storage) — providing both protection against a failed disk drive, and built-in disaster recovery. If one storage system fails, the data can be instantly retrieved from the remote system. Hyperscale storage systems such as Amazon’s Simple Storage Service, OpenStack Swift and DDN’s Web Object Scaler (WOS) are examples of hyperscale object storage implementations that make use of replication to protect data from disk failures and provide for disaster recovery.

The penalty for replication is that multiple copies of data obviously take up more disk space than a single RAID-protected copy. (Although with the storage inefficiencies of RAID 6, not as much as you would think.) Enter erasure coding. Erasure coding breaks objects to be stored into smaller pieces and spreads those pieces across multiple drives in a way that the data can still be recovered even if a number of those drives fail.

Examples of object storage erasure coding on a Hyperscale level include DDN’s WOS, which erases codes across multiple drives in a single local storage system, providing protection against a dual disk failure as in RAID 6. This application of local erasure coding is particularly valuable in high performance applications where fast data storage and quick access to information is critical.

Not surprisingly, the federal government is showing great interest in object, or cloud storage. Originally, the appeal was greater collaboration and security for their data, but in the current budget environment, 60 percent reduction in disk cost in a multi-petabyte environment can have a huge impact on an agency’s bottom line. Many CIO’s also have realized that with the reduction of disks, and less system complexity, their total cost of storage ownership may be lowered more than 60 percent by reducing power, cooling and staff time required to manage the system.