Performance and Scalability analysis of Storage Systems

Abstract

As virtualized environments are growing fast, the need for managing large number of virtual objects is also increasing.  Storage systems are part of virtualized or non-virtualized environments, public or private cloud, small or big datacentres. The performance and scalability of storage systems are gaining a lot of attention by many professionals, vendors and organizations. This whitepaper keeps performance and scalability testing of storage systems as centre point of discussion.

 

1.     Introduction

Meeting performance and scalability requirements of storage systems is one of the very important objectives of product teams. The list of parameters which can affect performance and scalability results is long. In fully scaled environment, there are high number of hardware devices, network devices, storage nodes, servers and software components involved therefore finding bottlenecks can become very complex at times. Usually, setting up scalability environment with accuracy is a time consuming task and requires careful adherence to industry best practices for optimized usage of storage systems. Design and execution of test cases, analysing results and drawing conclusions require systematic approach.

Before taking deep dive into the topic, basic terminologies associated with performance and scalability testing of storage systems are explained. With some examples, this whitepaper will help you find scalability limits of storage systems. During the process you will understand, what a performance baseline is and how it can be established.  This paper explains how results from performance baseline and scaled environments can be compared and analysed. It also explains some key factors which can affect storage performance. Guidelines to ensure system stability in scaled environment are also discussed.

 

2.     Performance and Scalability terminology

Primary objective of performance and scalability testing is to find out how scalable your storage systems are without performance degradation. Some of the important terms are discussed in brief below. 

  • Performance

Performance is the speed at which storage system operates. IOPS, throughput and latency are considered important measurements for storage system performance.

  • Scalability

Scalability is the ability of storage system to continue to function with minimum or no drop in performance when it is changed in size or volume or by any other parameter.

  • IOPS

IOPS refer to Input/output operations per second. If IOPS are analysed in isolation then it can mislead results. IOPS are meaningful only when considered with latency and workload (e.g. block size, sequential or random read). For example, with 4K and 8K block sizes, if storage system produces 2000 IOPS and 1000 IOPS respectively then it does not mean storage system performs better with 4K block size than 8K block size. In fact performance is equal in both cases when block sizes are different and all other factors are kept constant.

  • Throughput

Throughput is the amount of data transferred in a unit of time and is measured in kilobytes per second and (KBps) or megabytes per second (MBps).

  • Latency

Latency is the time taken to complete IO request and usually measured in milliseconds (ms).

  • IO Workload

IO workload defines block size, read/write percentage and percentage of random/sequential access. 

  • Establishing Baseline

Establishing baseline is a process to define acceptable performance results on predefined hardware and software configuration. This requires executing different workloads and concluding on acceptable performance which is agreed by stakeholders. As the scalability acceptance criteria, you can also define how much performance degradation is acceptable with respect to the baseline.

Depending upon the test case and measurements required, hardware and software configuration should be defined. In this process, you will finalize number of storage nodes, number of volumes, number of CPUs and memory for virtual machines, volume size, cache size, etc.

For example, LUN or Volume scalability test might require 8 storage nodes in the cluster with 16 volumes created as baseline configuration. As the test is expected to verify volume scalability, count of storage nodes in cluster and configuration parameters other than number of volumes will remain constant in scaled configuration.

For node scalability, baseline configuration might require just 2 storage nodes. As system scales storage node count increases. Number of volumes and configuration other than count of storage nodes remain constant.

In case of volume scalability, IOPS in scaled configuration are compared with baseline configuration for various queue depths whereas for node scalability, IOPS comparison is performed for various nodes count.

 

3.     Finding scalability limits with some examples

When storage system scales in terms of number of storage nodes or number of LUNs/volumes etc. and capacity is almost full, you might not get expected performance. Monitoring is required at storage, compute, network and virtualization level.  You can finalize the list of workloads that need to be simulated depending upon the application you are going to use in production environment. The charts and graphs based on the periodically collected IOPS, throughput and latency are very useful to find out deviation in performance after storage system scales.

First step in finding scalability limits is to compare performance measured in baseline configuration and scaled configuration. Following two examples demonstrate performance deviation. However, first example shows that deviation is within acceptable limits whereas second example shows performance degrades significantly after system scales. The examples are explained with the help of IOPS, and latency against multiple queue depths, which are important parameters from scalability testing perspective. Please note that these graphs and results are for illustration purposes only.

In order to test the scalability of the system, storage cluster was prefilled to 90% of its capacity. Also, the LUNs / volumes are equally distributed across storage nodes which are member of clustered storage system. The setup configuration is explained in Table 1.

Workload configuration used in examples:

Example 1:       Block size = 8K, Read = 60%, Write = 40% and Access = random

Example 2:       Block size = 8K, Read = 00%, Write = 100% and Access = random

setupTable 1. Setup configuration

iops

Figure 1 describes IOPS and latency comparison of baseline environment and scaled environment in which no scalability and performance issues have been found. At all queue depths, IOPS and latency in scaled environment are within acceptable range (+ /- 5% in this example) with respect to baseline environment. At higher queue depths, IOPS saturate and do not increase even if queue depth is increased. Throughput and IOPS are directly proportional to each other. The workload (8K random 60% read 40% write) scales well and meets scalability requirements.

ipos2

Figure 2 describes IOPS and latency comparison of baseline environment and scaled environment in which scalability and performance issues have been observed.  At all queue depths, IOPS in scaled environment show significant degradation when compared with baseline environment. Also, latency in scaled environment is higher than baseline environment. The deviation measured in scaled environment is not within acceptable range (more than +/- 5% in this example) of baseline environment. The workload (8K random 100% write) does not meet scalability requirements or acceptance criteria. In these type of circumstances, we need to find out the bottleneck which limits the IOPS in scaled environment.

 

4.   Factors affecting performance

The list of factors which affect performance and scalability of storage system is listed below. 

  • Disk configuration

It is a well-known fact that SSD is a much faster media than HDD. As total number of drives in a storage pool are increased, performance also increases. These days 10K RPM HDD drives are common and will result in lower latency than latency of 7K RPM drives. Depending upon RAID configuration and level of virtualization, performance will vary therefore underlying hardware configuration and software configuration which virtualizes storage hardware need to be configured properly. In case of software defined storage, local disks and disks contributing to cluster or storage pool need to be connected to separate HBAs because local IOs should not be taken into consideration when measurements are taken at cluster level.

  • Caching

SSDs are used for caching. The size and number of SSDs used for caching will play a role in storage performance. To improve read performance, write-through or read-ahead caching is used whereas write-back caching is used to improve write performance. If you experience a sudden drop in write performance, it could be due to cache being 100% full. If the speed at which data being written is constantly high then even flushing will not help.  Statistics such as cache hits and misses need to be monitored. 

  • CPU and Memory Resources

It is necessary to monitor CPU and memory usage of hypervisors, virtual machines (VMs), storage nodes and servers in order to find bottleneck. The servers which perform IOs on SSDs might require more CPU resources than servers performing IOs on HDDs. In virtualized environment, total number of virtual sockets and cores per socket assigned to VMs are important settings from scalability point of view. There are other settings like CPU affinity, page sharing, memory ballooning, etc. We recommend reading of hypervisor related document for advanced CPU and memory configurations which are not in scope of this document.

  • Workload

In general, read operation works faster than write operation. On HDDs, sequential IOs perform better than random IOs due to high seek time for each block of random IO. Random write performance in SSDs is also slower than sequential write.

  • Background Operations

When performance measurements are taken, background operations like disk zeroing, RAID rebuilding/recovery due to disk failure/replacement, restriping due to storage node shutdown/reboot or change in the RAID level should be ceased.

  • Full Stroke

In full stroke, data read or written during performance test should be spread across all disks and all clustered storage nodes. Create or pick LUNs to perform IO in such a way that all disks of storage pool or cluster are used. Short stroke is performed on small portion of HDD and due to low seek time, short stroke might give you good performance (lower latency) result which can be an incorrect conclusion. Full stroke performance results are more reliable than short stroke results.

  • Multipathing

Multipathing policies configured on initiator side determine how IOs are distributed across multiple paths. For example, in case of dm-multipathing on Linux (multipath.conf), ‘path_grouping_policy’ will decide how many paths are used to transfer data and ‘path_selector’ will decide how to distribute data/IOs across paths.

  • Command queuing

Multiple SCSI commands can be active on a LUN at the same time. Queue depth is the number of commands that can be active at a time which is configurable at SCSI driver level. If hypervisor issues more commands than configured queue depth then queuing takes place at hypervisor level. Under normal circumstances, command issued to disk in the storage array is executed immediately. It is not recommended for hypervisor (VMs running on hypervisor) to consistently issue more commands than LUN queue depth. This might result in disk queuing on storage array as well.

  • Network Configuration

Sometimes incorrect network configuration contributes to degraded storage performance therefore industry best practices should be followed when it comes to keeping management network and data-path/VM network separate and VLAN configuration to control broadcast traffic. NIC teaming and TCP offload engine (TOE) supported NICs can be used to enhance iSCSI performance. When you decide to use jumbo frames, make sure that MTU of 9000 is configured on all network equipment.

  • Virtualized Environment

In virtualized environment, as far as storage performance is considered, thick provisioned and eager zeroed LUNs are expected to be created. Thin provisioned LUNs lead to incorrect performance results as disk zeroing takes place just before actual write.

 

5.     System stability in scaled environment

The continuing problem with storage system is how to deal with escalating requirements in a manageable, smooth and non-disruptive manner. Multiple storage admins work on large environments simultaneously and perform operations such as creating volumes, snapshots, assigning LUNs, etc. Multiple requests from multiple hosts come to the storage system at the same time when it scales to large extent therefore the reliability of the system should be maintained.

The level of concurrency defines how these incoming requests are distributed across storage nodes for processing. Test activity to verify efficiency of processing concurrent requests must be carried out. The response time for each request is measured after sending concurrent requests to storage nodes. The level of concurrency and response time for the requests should not degrade. Storage system should respond gracefully to all the requests.

Stability testing can be performed by perturbing the system. While IOs are being performed, shutting down, rebooting, removing or adding any of the nodes in the cluster are some tests that can be performed.

5.1.   Test automation and tools

Creating large setup repeatedly and creating load of concurrent requests require test automation and tools to be in place. In the absence of test automation and tools, human errors are often introduced and it leads to misleading results and increased time to prepare setup and execute test strategy.

6.     Conclusion

As storage system scales, it is important to find whether performance degrades. If at all, there is degradation in performance then extent of degradation needs to be known. Degradation can be determined and reduced by using a systematic approach of comparing with baseline results and constant monitoring of resources. System stability can be maintained when storage systems scales to large extent.

 

 

By Mahesh Kamthe and Shubhada Savdekar

mahesh.kamthe@agiliad.com

shubhada.savdekar@agiliad.com

Advertisements

Storage Accelerators: Bridging Cloud Computing Storage I/O Bottleneck

It’s no secret that the cloud computing market has been growing rapidly both for public and private deployments directing hyper-scale infrastructure to store, process, and deliver accelerating data demands. To meet growing cloud application demands and cut infrastructure costs, public and enterprise clouds are increasingly using virtual machines (VMs) to consolidate applications onto few servers.

But in addressing the problem for under-utilized servers using Enterprise Virtualization the next biggest challenges that cloud computing is facing is how to solve the storage I/O bottleneck that comes with large-scale virtual machine (VM) deployments. As the number of Virtual Machines (VMs.) grows, the corresponding explosion in random read and write I/Os inevitably brings a network attached storage/storage area network (NAS/SAN) array or local direct attached storage (DAS) to its knees as disk I/O or target-side CPU performance become bottlenecked.

To work around these pains points, storage managers are adding capacity to their storage infrastructures to meet performance demand. Essentially they are trying to provide the storage system with access to enough hard disk spindles so that it can respond more quickly to the massive random I/O that these environments can generate. These “solutions” lead to racks and racks of disk shelves and very low actual capacity utilization which do not scale effectively in purchase cost, administration overhead and maintenance expenses (power, cooling, floor space) hence justifying the need for storage accelerators.

Options available and the propitious solution

This is where SSD’s are alluring the storage innovators .Although relatively low in capacity, solid state storage provides extremely high input/output per second (IOPS) performance that can potentially solve most storage I/O related challenges in the modern data center today.

Vendors like EMC, Marvell,QLogic ,NetApp and Dell are all attempting to develop solutions to bridge their customers to SSD .Following are the multiple ways in which SSD’s can be stacked in and their individual limitations:

Fixed Placement:

Fixed placement to solid state storage may be acceptable for certain workloads where specific subsets of data can be placed on SSDs, database application hot files (eg. indexes, aggregate tables, materialized views) being good examples. However it does not support a full complement of storage services (snapshots, replication, etc.) and many don’t have complete high availability options. Or the cost to implement high availability is simply too high ruling out this as a potential solution.

Automated Tiering:

Automated tiering works by moving sections of data to high performance storage as they become active and then demote them as they become less active. But for implementing this solution, the storage system must support automated tiering which may require upgrading to new storage infrastructure. Secondly, depending on the size of the sections of data to be promoted the time it takes for the storage controller to analyze data access patterns and start promoting data to the SSD tier can delay the time to ROI by days or weeks. And the third and considerable limitation to this is option is the wear-out of SSD’s because of write amplification due to constant reading and writing of large chunks of data.

Cache Appliances:

To alleviate some of these issues several third party manufacturers have created external caching appliances. These systems sit in line between the servers accessing storage and the storage itself. In other words all traffic must flow through the devices .This solution does create a broad caching tier for the environment providing a high performance boost to more storage, but it may be too broad, since all data going through these devices may not be appropriate for caching. And because of their inline nature solid state caches also are vulnerable to the performance limitations of the storage network and the storage controller. Finally, the inline caching appliance itself can become a limiter to scale and be overrun when many application servers are channeling storage I/O through the caching appliance.

Limitations to OFF-SERVER solutions available

All the above three solutions mentioned above do not actually improve the performance of the storage network or the storage controller, in fact they often expose its shortcomings.

And they also ignore the fact that the device needing access to the storage I/O performance boost is the application server or virtual host. This is where Server Based Storage Accelerators are creating a charm.

Server Based Storage Acceleration

Server Based Acceleration via caching takes the concepts of the cache appliance and moves them into the server, typically via a PCIe card. This provides several significant advantages. First, the problem is being fixed closer to the source (the application or hypervisor) and cached I/O does not need to traverse the storage network. Secondly instead of deploying something universally to solve a specific problem it makes the solution most cost-effective by deploying the solution discreetly to servers only where the problem exists and need more performance efficiency as compared to other relatively less brimmed servers.

Variations to Server Based Caching Accelerators:

Software Based Server accelerators:

This category of software for virtualized servers are where caching decision is made on the host much closer to the source than either the caching appliances or disk arrays.

Leading the chase is FusionIO’s ioTurbine an Application level Caching software; in which caching software runs in the background as a component in the hypervisor and in the guest operating system and caching decision is made in the guest OS and not the HOST OS , right where the application is generating the data hence enabling accelerated performance directly to virtualized applications that require it.

What helps FusionIO’s ioTurbine outperform and provide low latency and high IOPS caching solution is the ioDrive’s VSL layer. With VSL, the CPU directly interacts with the ioDrive as though it were just another memory tier below DRAM which otherwise would require to serialize the access through Raid Controller and embedded processors resulting in unnecessary context switching, queuing bottlenecks eventually leading to high latency.

The Virtual Storage layer (VSL) virtualizes the NAND flash arrays by combining the key elements of Operating Systems namely I/O subsystem and Virtual Memory Subsystem by using “block tables” that translate block addresses to physical ioDrive addresses which is analogous to Virtual Memory subsystem. With VSL these “block tables“are stored in host memory compared to the other solid state architectures that store “block tables” in embedded RAM and hence have to pass through the legacy protocols.

However such software’s consume the host resources such as CPU and memory for flash management tasks like wear leveling, garbage collection, and such which are heavy users of the CPU that by all rights should be dedicated to serving the application.

Secondly currently few other caching software offerings are not integrated with solid state devices. The software is purchased from one vendor and the solid state from a different vendor. This typically leads to questions and issues: Do the software and the solid state device work well together?

Hardware Based Server accelerators

Differentiating from software based accelerators, hardware based storage server accelerators (SSA’s) may take the form:

  1. Integrated storage adapters i.e. HBA or NIC enabled caches
  2. Solid state PCIe devices/ SAS or SATA SSD devices.

Solid state PCIe Adapter Devices:

PCIe Adapter Storage accelerators usually comprising of SSD’s , DRAM, embedded firmware’s work by intercepting , redirecting IO to high speed local storage(SSD’s), and accelerating the IO .This requires a combination of tightly coupled IO interaction layer and an innovative hardware layer, built around a special purpose ASIC.

There intermediation is not elementary and it is only available today because of intersecting trends around operating systems, virtualization, consolidation, and processing power that have enhanced the ability of vendors to interact with storage IO paths. Today we have a much better IO stack to interact than ever before, irrespective of the operating system, application, or hypervisor under consideration and hence making Server based acceleration possible. An additional unique feature is its ability to create a host I/O cache that is agnostic to all network and local-attached storage protocols. It can be configured to serve as a data cache for DAS, SAN or NAS storage arrays, irrespective of protocols such as iSCSI, SAS or NFS that are used to access the data storage.

Leading the innovation are well known vendors like Marvell with DragonFly, EMC with VFCache.

Marvell’s DragonFly enables the creation of a next-generation cloud-optimized data center architecture where data is automatically cached in a low-latency, high-bandwidth “host I/O cache” in application servers on its way to/from higher-latency, higher-capacity data storage. A unique differentiator for Marvell DragonFly is its use of a sophisticated NVRAM log-structured approach for flash-aware write buffering and re-ordered coalescing. Unlike writing to SSDs that quickly degrade after a certain number of random writes, DragonFly ensures consistently high performance and low-latency with zero write performance degradation over time.

On the other hand EMC’s VFCache accelerates reads and protects data by using a write-through cache to the networked storage to deliver persistent high availability, integrity and disaster recovery.

VFCache coupled with array based EMC FAST technology on EMC storage arrays can help place the application data in right storage tier based on the frequency with which data is being accessed. VFCache extends FAST technology from storage array to server by identifying the most frequently accessed data and promoting it into a tier that is closest to the application.

All and all VFCache is a hardware and software server caching solution that aims to dramatically improve your application response time and delivers more IOPS.

Integrated storage adapters (HBA or NIC enabled caches):

In the current market the first and the sole to lead the race in Integrated storage adapters i.e. HBA or NIC enabled caches, is QLOGIC’s Network Based Adapter Mt.Rainier .

Mt.Rainier is a combination of enterprise Server I/O Adapter, flash/SSD adapter, optimized driver and onboard firmware intelligence .This enhanced network HBA captures all I/O seamlessly, redirects it to flash media attached to PCIe flash storage.

In future, such accelerators will be ready to deploy in the infrastructure with no additional software required could eventually become the defacto HBA/CNA.

 

REFERENCES:

Jeff Boles. (2012). Server-Based Storage Acceleration. Available: http://www.infostor.com/imagesvr_ce/5383/server-based%20storage%20accel.pdf. Last accessed 24th oct 2012

George Crump. (2011). What is Server Based Solid State Caching?. Available: http://www.storage-switzerland.com/Articles/Entries/2011/6/27_What_is_Server_Based_Solid_State_Caching.html. Last accessed 24th OCT 2012

Jeff Boles. (2012). Storage Performance – Maybe It Never Was the Array’s Problem. Available: http://tanejagroup.com/news/blog/blog-systems-and-technology/storage-performance-maybe-it-never-was-the-arrays-problem. Last accessed 24th oct 2012.

Arun Taneja. (2012). EMC announces PCIe Flash Cache—Fusion IO gets its first major competitor. Available: http://tanejagroup.com/news/blog/systems-and-technology/emc-announces-pcie-flash-cache-fusion-io-gets-its-first-major-competitor. Last accessed 24th oct 2012.

EMC CORPORATION. (2012). INTRODUCTION TO EMC VFCACHE.Available: http://www.emc.com/collateral/hardware/white-papers/h10502-vfcache-intro-wp.pdf. Last accessed 24TH OCT 2012.

Taneja Group. (2012). BRINGING SERVER BASED STORAGE O SAN. Available: http://www.qlogic.com/Products/adapters/Documents/Taneja%20Group%20-%20Published%20Article%20-%20Server-based%20Storage%20Accelerators%20-%20September%202012%20-%20Final.QLogic.pdf. Last accessed 24th oct 2012.

SHAWN KUNG. (2012). Breaking Through Storage I/O Barrier For Cloud Computing.Available: http://www.marvell.com/storage/dragonfly/assets/Marvell_DragonFly_Solutions-002_white_paper.pdf. Last accessed 24th oct 2012

Victoria Koepnick. (2012). Not All Caching is Created Equal. Available: http://www.fusionio.com/blog/in-a-virtualized-environment-not-all-caching-is-created-equal/. Last accessed 24th oct 2012.

Trends in Storage – Transition to All-Flash & beyond

The use of flash in storage devices is not new; however, we are now seeing the increased use of flash compared to disk storage.  Almost all storage companies now provide hybrid solutions that have a mix of SSDs and HDDs in their boxes.  Those that don’t are completely switching over to leverage the advantages of SSDs.  As the costs of SSDs plummet further, we will see SSD being used more aggressively in storage boxes.  Companies like Avere, Marvell, Starboard are providing unique offerings with SSD supported devices. Soon, companies like XtremIO (recently acquired by EMC) with all-flash products will enter the fray.  Looking forward, there are some new memory technologies that could potentially replace flash in years to come.

Flash Technology

NAND flash technology is only a decade old. However, it has already gained significant traction due to its mechanical characteristics and performance. SSDs with NAND flash have a number of advantages over HDD devices. Some of them are:

  • Power savings of 2x
  • No noise
  • No vibration(since there are no moving parts)
  • Very little heat
  • About 30% faster than HDDs
  • Magnetic field safe

SSD costs, although reducing, are still higher than HDD costs. This is the only factor that is preventing a complete replacement of HDDs in storage products. See this article in Storage Review for a detailed comparison between SDD and HDD drives.

SSD offerings

Storage companies are already offering several solutions built around SSD drives in their storage servers and boxes. There are also ways in which SSDs can be utilized in the storage environment in a transitional manner while improving value proposition for customers.

Major vendors like EMC Corp. and NetApp Inc. have placed flash memory in their storage arrays and designed controller software to use the flash memory as a cache. EMC Fast (Fully Automated Storage Tiering) Cache improves the performance of existing SATA drives/FC and SAS drivers as well. NetApp on the other hand uses FlashCache to improve performance. This also compensates for the performance penalty due to their de-duplication technology (designed for capacity optimization). See this article by Joerg Hallbauer for a nice comparison between these technologies.

Avere Systems and Marvell, take a different standpoint. Avere’s FXT caching appliance sits between NAS arrays and clients.  Ron Bianchini, founder and CEO of Avere Systems claims that the appliance delivers 50 times lower access latency then existing NAS devices. Marvell’s Dragonfly VSA is designed to be placed inside the server. It uses NVRAM and SSD caches for random write handling.

Storage vendors are also transforming their fixed RAID systems to automatically tiered storage devices.  EMC’s FAST Virtual Pool is an example of a device in this category.  It places only data that requires high speed access to SSD drives while data that is only moderately used is placed on SAS drives.  Starboard Storage in its AC72 system also utilizes SSDs and HDDs with automated tiering. Data that is less frequently used is targeted towards HDDs.

By moving “hot” data to faster storage devices, tiered storage systems can perform faster than similar devices without the expense of widely deploying these faster devices.  Conversely, automated tiering can be more energy- and space-efficient because it moves “bulk” data to slower but larger-capacity drives.

Storage vendors are also coming up with “All Flash” products – despite the costs involved—to cater to customers that demand speed.  EMC announced “Project X” recently that utilises XtremIO technology to provide an all flash storage box that is fast, and uses in-line de-dup technology.

Future Memory Technologies

Even while we are considering the current industry trend towards flash SSD based devices, there are future technologies that could disrupt flash. Potential successor technologies to flash include Resistive RAM (RRAM), Magnetoresistive RAM (MRAM) and Phase-change memory (PCM).  But, more about these memory types in a different article.

Usage of flash in storage devices is not new; however, we are witnessing increasing usage of flash as compared to disk storage as a trend. Almost all companies now are providing hybrid solutions that have a mix of SSDs and HDDs in their boxes. Those that don’t are transitioning over to leverage the advantages of SSDs. As costs of SSDs plummet further, we will see SSD being used more aggressively in storage boxes. This short article talks about the current state of products in this space and how companies like Avere, Marvell, Starboard are providing unique offerings with SSD supported devices. Very soon we will also have companies provide all-flash products as is evidenced by acquisition of XtremIO by EMC. We briefly also touch upon which memory technologies can potentially replace flash in years to come.

Flash Technology

NAND flash technology is only a decade old. However, it has already gained significant focus due to its mechanical characteristics and performance. SSDs with NAND flash have a number of advantages over HDD devices. Some of them being:

  • Factor x2 Power savings
  • No noise devices
  • No vibration devices (since there are no moving parts)
  • Very little heat produced
  • About 30% faster than HDDs
  • Magnetic field safe

However, currently SSD costs, although reducing, are still higher than HDD costs. This is the only factor that is preventing a complete replacement of HDDs in storage products. See article on Storage Review for a detailed comparison between SDD and HDD drives.

SSD offerings

Storage companies are already offering several solutions around SSD drives in their storage servers and boxes. There are ways in which SSDs can be utilized in storage environment in a transitional manner while improving value proposition for customers.

Major vendors like EMC Corp. and NetApp Inc. have placed flash memory in their storage arrays and designed controller software to use it as a cache. EMC Fast (Fully Automated Storage Tiering) Cache improves the performance of existing SATA drives/FC and SAS drivers as well. NetApp on the other hand uses FlashCache to improve performance. This also compensates the performance penalty caused due to their de-duplication technology which is designed for performing capacity optimization. See article by Joerg Hallbauer for a nice comparison between these technologies.

Avere Systems and Marvell Technology Group Ltd, take a different standpoint. Avere’s FXT caching appliance sits between NAS arrays and clients. Ron Bianchini, founder and CEO of Avere Systems claims that the appliance delivers 50 times lower access latency using customer’s existing NAS devices. Marvell’s Dragonfly VSA is designed for placement inside the server itself. It uses NVRAM and SSD caches for random write handling.

Storage vendors are also transforming their fixed RAID systems to automatically tiered storage devices.

EMC’s FAST Virtual Pool is an example of a device in this category. It places only data that requires high speed access to SSD drives while data that is only moderately used is placed on SAS drives.

Starboard Storage in its AC72 system also utilizes SSDs and HDDs with automated tiering. Data that is less frequently used is targeted towards HDDs.

By moving “hot” data to faster storage devices, tiered storage systems can perform faster than similar devices without the expense of widely deploying these faster devices. Conversely, automated tiering can be more energy- and space-efficient because it moves “bulk” data to slower but larger-capacity drives.

Storage vendors are also coming up with “All Flash” products despite the costs involved to cater to customers that demand speed. EMC announced “Project X” recently that utilises XtremIO technology to provide all flash storage box that is fast, and uses in-line de-dup technology.

Future Memory Technologies

Even while we are considering the current industry trends towards flash SSD based devices, there are future technologies that can disrupt this current trend towards flash. Potential successor technologies to flash include Resistive RAM (RRAM), Magnetoresistive RAM (MRAM) and Phase-change memory (PCM). But, more about these memory types in a different article.

VM Portability – OVFTool

VM Portability has become of even more importance after the evolution of virtualization in cloud computing.  It doesn’t only include pushing around virtual images but also various configurations of application, data, identity, security, or networking.  Even if all the components were themselves virtualized, simply porting the virtual instances from one location to another is not enough to assure interoperability.  This is because the components must be able to collaborate and this requires connectivity and other configuration information.

One of the solutions to this includes the VMWARE’s OVF/OVA formats. OVF is claimed to enable efficient, flexible, and secure distribution of enterprise software, facilitating the mobility of virtual machines and platform independence (Xen, KVM, Microsoft, and VMware etc).

What follows is an attempt to use it for VMWARE. Generation of OVF/OVA from a virtual machine using vShpere client turned out to be a simple process, which included selecting the VM in vSphere and clicking “Export to OVF” from main menu.

However, this is a manual process, and doesn’t help if we need to do it repeatedly for different builds of software or a large number of VMs.  To automate this process, we need some kind of CLIs to be able to call them in a script. VMWARE’s OVFtool provides us with that capability (and much more, which we will discuss in future posts).  So I tried to automate this through shell and perl scripts.  Both the scripts install a new RPM to a CentOS VM and then export it to OVF format.

SHELL Version:
#!/bin/sh
#
# Description: To install an RPM to the CentOS VM (ip=$1) & Export it
# to OVF format
#
if test $# -ne 3
then
echo “Usage – $0 vmIP rpmPath esxIP”
else
fileName = $(basename “$2”)
if ssh $1 “scp 192.168.112.132:$2 .;rpm -Uvh $fileName;exit;”
then
ovftool “vi://root@$3/CentOS” “/home/CentOS.ovf”
else
echo “Unable to connect to $1”
fi
fi

PERL Version:
#!/usr/bin/perl -w

use Net::OpenSSH;

if ($#ARGV != 2 ) {
print “usage: perl installRPM_ExportVM2OVF.pl vmIP rpmPath esxIPn”;
exit;
}
$vmIP=$ARGV[0];
$rpmPath=$ARGV[1];
$esxIP=$ARGV[2];

my $sshCentOSVM = Net::OpenSSH->new($vmIP, user => ‘root’, password => ‘PASSWORD’);

$sshCentOSVM->error and die “Unable to connect to remote host: ” . $sshCentOSVM->error;

my @values = split(‘/’, $rpmPath);
my $fileName = $values[$#values];

$sshCentOSVM->system(“scp 192.168.112.132:$rpmPath .;rpm -Uvh $fileName;”);

system(“ovftool vi://root@” . $esxIP . “/CentOS /home/CentOS.ovf”)

Upcoming posts will include the usage for other hypervisor platforms.

Trends in Storage – Phase Change Memory (PCM)

What is Phase Change Memory?

Phase change memory (PCM) is an emerging non-volatile solid-state memory technology employing phase change materials.  It has been considered as possible replacement for both flash memory and DRAM but the technology still needs to mature before it can be put to production usage.

We may not realize it, but we are already using phase change materials to store data – they are used in re-writeable optical storage, such as CD-RW and DVD-RW discs.  For optical drives, bursts of energy from a laser put tiny regions of the material into amorphous or crystalline states to store data. The amorphous state reflects light less effectively than the crystalline state, allowing the data to be read back again.

Phase change materials, such as salt hydrates, are also capable of storing and releasing large amounts of energy when they move from a solid to a liquid state and back again.  Traditionally, they have been used in cooling systems and, more recently, in solar-thermal power stations, where they store heat during the day that can be released to generate power at night.

However, there are additional properties of PCM that are being researched that may allow for new and exicting use of these materials.

For memory devices it is not their thermal or optical properties that make PCMs so attractive. Instead it is their ability to switch from a disorderly (or amorphous) state to an orderly (or crystalline) one very quickly.  PCM memory chips rely on glass-like materials called chalcogenides, typically made of a mixture of germanium, antimony and tellurium.  In PCM the pronounced change in electrical resistivity when the material changes between its two stable states, namely the amorphous and poly-crystalline phases, is used.

Promise of PCM

With a combination of speed, endurance, non-volatility and density, PCM can enable a paradigm shift for enterprise IT and storage systems as soon as 2016.  The benefits of such a memory technology would allow computers and servers to boot instantaneously and would significantly enhance the overall performance of IT systems.  PCM can write and retrieve data many orders of magnitude faster than flash, enable higher storage capacities, and also not lose data when the power is turned off.

Phase change materials are also being considered for the practical realization of ‘brain-like’ computers where a PCM cell is used to act like a hardware neuron and to have a synaptic like functionality via the ‘memflector’, an optical analogue of the memristor.

How does Phase Change Memory work?

PCM memory chips consists of chalcogenide sandwiched between two electrodes. One of the electrodes is a resistor which heats up when current passes through it.  A gentle pulse of electrical energy causes the resistor to provide heat and thereby causes the chalcogenide to melt. As the material cools, it forms a crystalline structure. This state corresponds to the cell storing a “1”. When a short, stronger pulse is applied, the chalcogenide melts but does not form crystals as it cools.  It assumes a disorderly amorphous state corresponding to be “0”.  The amorphous state has higher electrical resistance than crystalline state.  Hence, PCM memory cells are also sometimes referred to as “memristors”.  This complete process is reversible and controlled by the application of currents.  Hence, the PCM cell can switch between “0” and “1” over and over again.

If the amount of current provided to PCM can be controlled, then chalcogenide enters an intermediate state which is a combination of amorphous and crystalline phases.  This is the principle of multilevel PCM which can store multiple bits of information in a single cell.

IBM researchers have built PCM memory chips with 16 states (or four bits) per cell, and David Wright, a data-storage researcher at the University of Exeter, in England, has built individual PCM memory cells with 512 states (or nine bits) per cell. But the larger the number of states, the more difficult it becomes to differentiate between them, and the higher the sensitivity of the equipment required to detect them, he says.

When was PCM discovered?

Although the concept of Phase Change Materials came along some 40 years ago, it was only in 2011, that scientists at IBM Research demonstrated that PCM can reliably store multiple data bits per cell over extended periods of time.

What is the performance of Phase Change Memory?

PCM exhibits highly desirable characteristics, such as rapid state transition, good data retention and performance, as well as future scaling to ultra-small device dimensions.  Writing to individual flash-memory cells involves erasing an entire region of neighbouring cells first.  This is not necessary with PCM memory, which makes it much faster.  Indeed, some prototype PCM memory devices can store and retrieve data 100 times faster than flash memory.

Another benefit of PCM memory is that it is extremely durable, capable of being written and rewritten at least 10m times.  Flash memory, by contrast, wears out after a few thousand rewrite cycles, because of the high voltages that are required to move electrons in and out of the floating-gate enclosure.  Accordingly, flash memory needs special controllers to keep track of which parts of the chip have become unreliable, so they can be avoided.  This increases the cost and complexity of flash, and slows it down.

PCM is also inherently fast because the phase-change materials can flip their phase very quickly, in the order of a few nanoseconds.  Recently it has been shown through simulation materials that these phase-change mechanisms can happen on the sub-nanosecond time scale as well.

In addition, PCM offers greater potential for future miniaturisation than flash.  As flash-memory cells get smaller and devices become denser, the number of electrons held in the floating gate decreases.  Because the number of electrons is finite, there will soon come a point at which this design cannot be shrunk any further.  PCM offers a radically different approach.  With PCM, the changes between the crystalline and amorphous states don’t involve the movement of electrons.  Therefore, by nature, phase change is less harmful to the material and it doesn’t deteriorate as easily over time as flash.

The IBM research team believe that the multi-level phase change memory technology could be ready for use by 2016.

How will PCM be used?

Replacing flash is not going to be easy though.  Flash technology has a huge customer base.  As of today, flash is the most advanced technology of all the solid-state technologies out there. However, Flash and PCM may play in different spaces.  PCM could serve as the main memory for enterprise class applications due to its very high endurance and better latency properties.  PCM could also complement DRAM in future products where instead of using a small DRAM, there could be a bigger pool with PCM and DRAM, with the DRAM serving as a cache for the PCM.

At the same time, some of the biggest memory manufacturers are already considering moving to PCM as a replacement for NOR flash (used in cell phones).  NOR flash stores source code.  Because NOR flash is reaching the end of its scaling pathway, this is one area where people think that PCM can enter the market.

The technology could benefit applications such as “big data” analytics and cloud computing.

Operating systems, file systems, databases and other software components need significant enhancements to enable PCM to live up to its potential.  Studies show that any piece of software that spends a lot of time trying to optimize disk performance is going to need significant reengineering in order to take full advantage of these new memory technologies.

Who is leading the work on Phase Change Memory?

Companies like Micron Technology, Samsung and SK Hynix—the three giants of digital storage—are already applying PCM inside memory chips.  The technology has worked well in the laboratory for some time and is now moving towards the mainstream consumer market.  Micron started selling its first PCM-based memory chips for mobile phones in July, offering 512-megabit and one-gigabit storage capacity.

IBM is now working with SK Hynix to bring multi-level PCM-based memory chips to market.  The aim is to create a form of memory capable of bridging the gap between flash, which is used for storage, and dynamic random-access memory, which computers use as short-term working memory, but which loses its contents when switched off.  PCM memory, which IBM hopes will be on sale by 2016, would be able to serve simultaneously as storage and working memory—a new category it calls “storage-class memory”.

Conclusion

PCM promises to be smaller and faster than flash, and will probably be storing your photos, music and messages within a few years.

PCM memory does not merely threaten to dethrone flash, in short, it could also lead to a radical shift in computer design—a phase change on a much larger scale.

References

  • The paper “Drift-tolerant Multilevel Phase-Change Memory” by N. Papandreou, H. Pozidis, T. Mittelholzer, G.F. Close, M. Breitwisch, C. Lam and E. Eleftheriou, was recently presented by Haris Pozidis at the 3rd IEEE International Memory Workshop in Monterey, CA.
  • The Economist: “Phase-change memory, Altered states”, Q3 2012
  • IBM Research, Zurich. “IBM scientists demonstrate computer memory breakthrough”
  • Search Solid State Storage. “UCSD lab studies future changes to non-volatile memory technologies”
  • Search Solid State Storage. “New memory technologies generate attention as successor to NAND flash”
  • Arithmetic and Biologically-Inspired Computing Using Phase-Change Materials by C. David Wright, Yanwei Liu, Krisztian I. Kohary, Mustafa M. Aziz, Robert J. Hicken

Spotlight – XtremIO

Introduction

On May 10, 2012, EMC announced that it acquired privately held XtremIO.  This article talks about XtremIO, the technology, the reasons behind the acquisition, and what it means for other big players.

About the Company

XtremIO is based in Herzliya, Israel (“The Start-Up Nation”). It was founded in 2009 and has raised $25 million in venture capital funding.  It provides an “All-flash” technology product built from the ground up using data reduction techniques such as inline deduplication to lower costs and save capacity.

It competes against other all-flash array makers such as Solid Fire, Texas Memory Systems (TMS), Violin Memory, Nimbus Data, Pure Storage and Whiptail.

Technology

XtremIO describes its own all-flash array as having a scale-out clustered design where additional capacity and performance can be added when needed.  It also has no single point of failure and supports real-time inline data deduplication.  All-Flash means that the XtremIO system supports high levels of I/O performance, particularly for random I/O workloads that are typical in virtualized environments, with consistently low (sub-millisecond) latency.  It also has integration to VMware through VAAI.

XtremIO won a 2012 Green Enterprise IT award from the Uptime Institute for IT Product Deployment.

Acquisition of XtremIO by EMC

Israel-based companies, for the most part, are not great at selling – what they are great at is engineering.  Companies like EMC and NetApp, have big sales channels and can pick up small Israeli start-ups for less money for their technology only.  The XtremeIO acquisition was reported to be valued at $430 million.

EMC and XtremIO also have natural ties in part because XtremIO co-founder Shuki Bruck sold his previous company Rainfinity to EMC.

Big competitors, including NetApp, HP, Dell, IBM, and Hitachi Data Systems may feel pressured to get in the game and look for such companies to acquire, reports Derrick Harris.  Indeed, NetApp was reportedly also trying to make a bid for XtremIO.

EMC Advantage

All-flash arrays are expensive, high-performance systems built for applications requiring high throughput, such as relational databases, big data analytics, large virtual desktop infrastructure or processes requiring large batch workloads like backups.

Flash arrays can deliver high performance using a relatively small amount of rack space, power and cooling.

The all-flash array of the type XtremIO offers will give EMC faster performance across both virtualized and big data environments, meaning it will also help EMC’s subsidiary VMWare, which focuses on virtualization. Combined with EMC’s server-side PCI flash product called Project Lightning, which keeps hot data in an SSD cache sitting alongside the processor, that’s one powerful hardware platform for tomorrow’s applications.

EMC needed new technology, and rather than develop it in house, it chose to buy that technology, and a strong flash storage development team. The other large storage vendors will probably make similar purchases to catch up.

Rather than combine Isilon and VNX somehow, EMC acquired XtremIO. XtremIO offers scale-out, great data management and great performance.  In fact, their subsystem was built specifically for flash, whereas flash was an afterthought for NetApp (they still leverage an HDD-optimized subsystem).

Industry Impact

It is clear Flash is going to become even more imperative for the big storage players and getting in first with XtremIO might pay off for EMC and become the deal of the year.

With pressure mounting on other big-players to catch-up with EMC, there are other similar companies like XtremIO that may be the next target for possible acquisitions. Fusion-IO, Violin Memory, Virident or Kaminario could be possible acquisition targets that other players might be looking at.

EMC Project X

At VMworld 2012, EMC showed an early version of the all-flash array based on XtremIO technology.  Project X, as the array is known for now, has been revealed to have dual Intel-based controllers in each X-brick scaling unit along with a shelf of flash drives, 2 host adaptors with 2 ports each (supporting FC and iSCSI), and Infiniband connecting the modules together in a scale-out manner.

The demo claimed 2600% dedupe rates.  The dedupe is global, inline, always on and is said to extend SSD lifespans by reducing the rate of writes to each drive.  The array delivers a predictable sub-millisecond I/O response time for every 4K block no matter what you happen to be doing: read, write, sequential, random, snaps, etc.  The formerly big number of a million IOPS can result from a very modest configuration of XtremIO modules.

The price of the new machines was not disclosed or even discussed, but a likely release date of somewhere in the first half of 2013 remains on EMC’s agenda.

References

  • The register, “EMC shows off XtremIO’s Project X box”
  • VentureBeat.com, “EMC’s buy of XtremIO for $400M could spur M&A rush in flash storage”
  • VentureBeat.com, “Flash storage mania — EMC buys XtremIO, eyes turn toward Violin”
  • Gigaom, “If EMC buys XtremIO, the flash war is on”
  • EMC, “VMware view solution guide”
  • Computer Weekly, “XtremIO: Costly mistake or genius deal for EMC?”
  • Chuck’s Blog, “When Flash Changed Storage: XtremIO Preview”

Spotlight – Akamai: Pioneer in CDN

Akamai was recently in the news for acquisitions of Blaze Software Inc. as well as Cotendo Inc. Akamai has done it again with yesterday’s acquisition of FastSoft Inc., a provider of content acceleration software.  The acquisition is expected to enhance Akamai’s cloud infrastructure solutions with technology for optimizing the throughput of video and other digital content across IP networks.

This article talks about Content Delivery Networks in general, Akamai, and its recent acquisitions.

Overview of Content Delivery Network (CDN)

Today the internet has about 77 TBps of global capacity.  As the internet grows bigger the number of Internet Exchange Points (IXP) across the world has increased from 50 in 2000 to over 350 in 2012. Today, when a person requests a video stream, or an internet download, the data is sent through a content delivery network, and it doesn’t need to travel as far as would be the case if the data was sent directly from the source server to the user.  As a result, the user gets better quality of service, and server load is also reduced as the data is cached over the content delivery network. Over 45 per cent of web traffic today is delivered over CDNs.

Conceptually, a delivery network is a virtual network built as a software layer over the actual internet, deployed on widely distributed hardware, and tailored to meet the specific system requirements of distributed applications and services.  A delivery network provides enhanced reliability, performance, scalability and security that is not achievable by directly utilizing the underlying Internet.  A CDN, in the traditional sense of delivering static Web content, is one type of delivery network.  Today CDN encompasses dynamic content as well.

Overview of Akamai

Akamai, launched in early 1999, is the pioneer in Content Delivery Networks.  The company evolved out of an MIT research effort to solve the flash crowd problem.  It provided CDN solutions to help businesses overcome content delivery hurdles.  Since then, both the Web and the Akamai platform have evolved tremendously.  In the early years, Akamai delivered only Web objects (images and documents).  It has since evolved to distribute dynamically generated pages and even applications to the network’s edge, providing customers with on-demand bandwidth and computing capacity.

Today, Akamai delivers 15-20% of all Web traffic worldwide and provides a broad range of commercial services beyond content delivery, including Web and IP application acceleration, EdgeComputing™, delivery of live and on-demand high-definition (HD) media, high-availability storage, analytics, and authoritative DNS services.  Comprising more than 61,000 servers located across nearly 1,000 networks in 70 countries worldwide, the Akamai platform delivers hundreds of billions of Internet interactions daily, helping thousands of enterprises boost the performance and reliability of their Internet applications.

Akamai Acquisitions

The following list shows some of the Akamai acquisitions over its history.

  • June 2005, Akamai acquired Speedera Networks valued at $130 million.
  • November 2006, Akamai acquired Nine Systems Corporation valued at $164 million.
  • March 2007, Akamai acquired Netli valued at $154 million.
  • April 2007, Akamai acquired Red Swoosh valued at $15 million.
  • November 2008, Akamai acquired aCerno valued at $90.8 million
  • June 2010, Akamai acquired Velocitude LLC valued at $12 million.
  • February, 2012, Akamai acquired Blaze Software Inc. a provider of front-end      optimization (FEO) technology.
  • March 2012, Akamai acquired Cotendo, valued at $268 million, that offers an      integrated suite of web and mobile acceleration services.
  • September 13, 2012, Akamai acquired FastSoft Inc., provider of content acceleration      software.

The latest acquisition of Akamai is FastSoft Inc. which was launched in 2006 to commercialize network optimization technology.  FastSoft’s patented FastTCP algorithms improve Transmission Control Protocol (TCP), adding intelligence designed to increase the speed of dynamic page views and file transfer downloads while reducing the effects of network latency and packet loss.  FastSoft’s unique technology has helped improve website and web application performance across the first and last miles, as well as through the cloud, without requiring client software or browser plug-ins.  Combining FastSoft with Akamai’s existing network protocols is expected to help enable Akamai to optimize server capacity, deliver higher throughput for video, and bring greater efficiency to its global platform.

If we focus on the 2012 acquisitions of Blaze, Cotendo and now FastSoft, they are indicative of a trend towards providing end-to-end acceleration for an entire leg of the transaction.  With the current proliferation of mobile devices and users accessing internet over mobile devices, Akamai is also targeting various performance improvements and network services to deliver content to these users with lower latency and better security than has previously been available.

Compuware’s Gomez platform is well-known technology to measure the performance of Web applications.  According to Gomez benchmarks, it takes a mobile Web site from 7.7 to 8 seconds to open, versus 2 seconds on a desktop computer, says Pedro Santos, VP of the Mobile Business at Akamai.

“So there is a tremendous opportunity to improve the performance of mobile web sites and applications,” he says, citing user surveys that 71% of consumers expect Web sites to open on a mobile phone as quickly as they do on a desktop computer, and that 77% of organizations today have mobile web pages that take longer than 5 seconds on average to open.

Akamai’s new products like Terra Alta and Aqua Mobile Accelerator also substantiate this trend.

It will be interesting to study the strategic response from Akamai’s competitors like Limelight networks.

References

  • Network computing, “Akamai Boosts Web, Mobile App Performance”
  • http://www.prnewswire.com, “Akamai Acquires FastSoft”
  • Gigaom, “The shape of Internet has changed. It now lives life on the edge”