Storage Accelerators: Bridging Cloud Computing Storage I/O Bottleneck

It’s no secret that the cloud computing market has been growing rapidly both for public and private deployments directing hyper-scale infrastructure to store, process, and deliver accelerating data demands. To meet growing cloud application demands and cut infrastructure costs, public and enterprise clouds are increasingly using virtual machines (VMs) to consolidate applications onto few servers.

But in addressing the problem for under-utilized servers using Enterprise Virtualization the next biggest challenges that cloud computing is facing is how to solve the storage I/O bottleneck that comes with large-scale virtual machine (VM) deployments. As the number of Virtual Machines (VMs.) grows, the corresponding explosion in random read and write I/Os inevitably brings a network attached storage/storage area network (NAS/SAN) array or local direct attached storage (DAS) to its knees as disk I/O or target-side CPU performance become bottlenecked.

To work around these pains points, storage managers are adding capacity to their storage infrastructures to meet performance demand. Essentially they are trying to provide the storage system with access to enough hard disk spindles so that it can respond more quickly to the massive random I/O that these environments can generate. These “solutions” lead to racks and racks of disk shelves and very low actual capacity utilization which do not scale effectively in purchase cost, administration overhead and maintenance expenses (power, cooling, floor space) hence justifying the need for storage accelerators.

Options available and the propitious solution

This is where SSD’s are alluring the storage innovators .Although relatively low in capacity, solid state storage provides extremely high input/output per second (IOPS) performance that can potentially solve most storage I/O related challenges in the modern data center today.

Vendors like EMC, Marvell,QLogic ,NetApp and Dell are all attempting to develop solutions to bridge their customers to SSD .Following are the multiple ways in which SSD’s can be stacked in and their individual limitations:

Fixed Placement:

Fixed placement to solid state storage may be acceptable for certain workloads where specific subsets of data can be placed on SSDs, database application hot files (eg. indexes, aggregate tables, materialized views) being good examples. However it does not support a full complement of storage services (snapshots, replication, etc.) and many don’t have complete high availability options. Or the cost to implement high availability is simply too high ruling out this as a potential solution.

Automated Tiering:

Automated tiering works by moving sections of data to high performance storage as they become active and then demote them as they become less active. But for implementing this solution, the storage system must support automated tiering which may require upgrading to new storage infrastructure. Secondly, depending on the size of the sections of data to be promoted the time it takes for the storage controller to analyze data access patterns and start promoting data to the SSD tier can delay the time to ROI by days or weeks. And the third and considerable limitation to this is option is the wear-out of SSD’s because of write amplification due to constant reading and writing of large chunks of data.

Cache Appliances:

To alleviate some of these issues several third party manufacturers have created external caching appliances. These systems sit in line between the servers accessing storage and the storage itself. In other words all traffic must flow through the devices .This solution does create a broad caching tier for the environment providing a high performance boost to more storage, but it may be too broad, since all data going through these devices may not be appropriate for caching. And because of their inline nature solid state caches also are vulnerable to the performance limitations of the storage network and the storage controller. Finally, the inline caching appliance itself can become a limiter to scale and be overrun when many application servers are channeling storage I/O through the caching appliance.

Limitations to OFF-SERVER solutions available

All the above three solutions mentioned above do not actually improve the performance of the storage network or the storage controller, in fact they often expose its shortcomings.

And they also ignore the fact that the device needing access to the storage I/O performance boost is the application server or virtual host. This is where Server Based Storage Accelerators are creating a charm.

Server Based Storage Acceleration

Server Based Acceleration via caching takes the concepts of the cache appliance and moves them into the server, typically via a PCIe card. This provides several significant advantages. First, the problem is being fixed closer to the source (the application or hypervisor) and cached I/O does not need to traverse the storage network. Secondly instead of deploying something universally to solve a specific problem it makes the solution most cost-effective by deploying the solution discreetly to servers only where the problem exists and need more performance efficiency as compared to other relatively less brimmed servers.

Variations to Server Based Caching Accelerators:

Software Based Server accelerators:

This category of software for virtualized servers are where caching decision is made on the host much closer to the source than either the caching appliances or disk arrays.

Leading the chase is FusionIO’s ioTurbine an Application level Caching software; in which caching software runs in the background as a component in the hypervisor and in the guest operating system and caching decision is made in the guest OS and not the HOST OS , right where the application is generating the data hence enabling accelerated performance directly to virtualized applications that require it.

What helps FusionIO’s ioTurbine outperform and provide low latency and high IOPS caching solution is the ioDrive’s VSL layer. With VSL, the CPU directly interacts with the ioDrive as though it were just another memory tier below DRAM which otherwise would require to serialize the access through Raid Controller and embedded processors resulting in unnecessary context switching, queuing bottlenecks eventually leading to high latency.

The Virtual Storage layer (VSL) virtualizes the NAND flash arrays by combining the key elements of Operating Systems namely I/O subsystem and Virtual Memory Subsystem by using “block tables” that translate block addresses to physical ioDrive addresses which is analogous to Virtual Memory subsystem. With VSL these “block tables“are stored in host memory compared to the other solid state architectures that store “block tables” in embedded RAM and hence have to pass through the legacy protocols.

However such software’s consume the host resources such as CPU and memory for flash management tasks like wear leveling, garbage collection, and such which are heavy users of the CPU that by all rights should be dedicated to serving the application.

Secondly currently few other caching software offerings are not integrated with solid state devices. The software is purchased from one vendor and the solid state from a different vendor. This typically leads to questions and issues: Do the software and the solid state device work well together?

Hardware Based Server accelerators

Differentiating from software based accelerators, hardware based storage server accelerators (SSA’s) may take the form:

  1. Integrated storage adapters i.e. HBA or NIC enabled caches
  2. Solid state PCIe devices/ SAS or SATA SSD devices.

Solid state PCIe Adapter Devices:

PCIe Adapter Storage accelerators usually comprising of SSD’s , DRAM, embedded firmware’s work by intercepting , redirecting IO to high speed local storage(SSD’s), and accelerating the IO .This requires a combination of tightly coupled IO interaction layer and an innovative hardware layer, built around a special purpose ASIC.

There intermediation is not elementary and it is only available today because of intersecting trends around operating systems, virtualization, consolidation, and processing power that have enhanced the ability of vendors to interact with storage IO paths. Today we have a much better IO stack to interact than ever before, irrespective of the operating system, application, or hypervisor under consideration and hence making Server based acceleration possible. An additional unique feature is its ability to create a host I/O cache that is agnostic to all network and local-attached storage protocols. It can be configured to serve as a data cache for DAS, SAN or NAS storage arrays, irrespective of protocols such as iSCSI, SAS or NFS that are used to access the data storage.

Leading the innovation are well known vendors like Marvell with DragonFly, EMC with VFCache.

Marvell’s DragonFly enables the creation of a next-generation cloud-optimized data center architecture where data is automatically cached in a low-latency, high-bandwidth “host I/O cache” in application servers on its way to/from higher-latency, higher-capacity data storage. A unique differentiator for Marvell DragonFly is its use of a sophisticated NVRAM log-structured approach for flash-aware write buffering and re-ordered coalescing. Unlike writing to SSDs that quickly degrade after a certain number of random writes, DragonFly ensures consistently high performance and low-latency with zero write performance degradation over time.

On the other hand EMC’s VFCache accelerates reads and protects data by using a write-through cache to the networked storage to deliver persistent high availability, integrity and disaster recovery.

VFCache coupled with array based EMC FAST technology on EMC storage arrays can help place the application data in right storage tier based on the frequency with which data is being accessed. VFCache extends FAST technology from storage array to server by identifying the most frequently accessed data and promoting it into a tier that is closest to the application.

All and all VFCache is a hardware and software server caching solution that aims to dramatically improve your application response time and delivers more IOPS.

Integrated storage adapters (HBA or NIC enabled caches):

In the current market the first and the sole to lead the race in Integrated storage adapters i.e. HBA or NIC enabled caches, is QLOGIC’s Network Based Adapter Mt.Rainier .

Mt.Rainier is a combination of enterprise Server I/O Adapter, flash/SSD adapter, optimized driver and onboard firmware intelligence .This enhanced network HBA captures all I/O seamlessly, redirects it to flash media attached to PCIe flash storage.

In future, such accelerators will be ready to deploy in the infrastructure with no additional software required could eventually become the defacto HBA/CNA.



Jeff Boles. (2012). Server-Based Storage Acceleration. Available: Last accessed 24th oct 2012

George Crump. (2011). What is Server Based Solid State Caching?. Available: Last accessed 24th OCT 2012

Jeff Boles. (2012). Storage Performance – Maybe It Never Was the Array’s Problem. Available: Last accessed 24th oct 2012.

Arun Taneja. (2012). EMC announces PCIe Flash Cache—Fusion IO gets its first major competitor. Available: Last accessed 24th oct 2012.

EMC CORPORATION. (2012). INTRODUCTION TO EMC VFCACHE.Available: Last accessed 24TH OCT 2012.

Taneja Group. (2012). BRINGING SERVER BASED STORAGE O SAN. Available: Last accessed 24th oct 2012.

SHAWN KUNG. (2012). Breaking Through Storage I/O Barrier For Cloud Computing.Available: Last accessed 24th oct 2012

Victoria Koepnick. (2012). Not All Caching is Created Equal. Available: Last accessed 24th oct 2012.