Performance and Scalability analysis of Storage Systems


As virtualized environments are growing fast, the need for managing large number of virtual objects is also increasing.  Storage systems are part of virtualized or non-virtualized environments, public or private cloud, small or big datacentres. The performance and scalability of storage systems are gaining a lot of attention by many professionals, vendors and organizations. This whitepaper keeps performance and scalability testing of storage systems as centre point of discussion.


1.     Introduction

Meeting performance and scalability requirements of storage systems is one of the very important objectives of product teams. The list of parameters which can affect performance and scalability results is long. In fully scaled environment, there are high number of hardware devices, network devices, storage nodes, servers and software components involved therefore finding bottlenecks can become very complex at times. Usually, setting up scalability environment with accuracy is a time consuming task and requires careful adherence to industry best practices for optimized usage of storage systems. Design and execution of test cases, analysing results and drawing conclusions require systematic approach.

Before taking deep dive into the topic, basic terminologies associated with performance and scalability testing of storage systems are explained. With some examples, this whitepaper will help you find scalability limits of storage systems. During the process you will understand, what a performance baseline is and how it can be established.  This paper explains how results from performance baseline and scaled environments can be compared and analysed. It also explains some key factors which can affect storage performance. Guidelines to ensure system stability in scaled environment are also discussed.


2.     Performance and Scalability terminology

Primary objective of performance and scalability testing is to find out how scalable your storage systems are without performance degradation. Some of the important terms are discussed in brief below. 

  • Performance

Performance is the speed at which storage system operates. IOPS, throughput and latency are considered important measurements for storage system performance.

  • Scalability

Scalability is the ability of storage system to continue to function with minimum or no drop in performance when it is changed in size or volume or by any other parameter.

  • IOPS

IOPS refer to Input/output operations per second. If IOPS are analysed in isolation then it can mislead results. IOPS are meaningful only when considered with latency and workload (e.g. block size, sequential or random read). For example, with 4K and 8K block sizes, if storage system produces 2000 IOPS and 1000 IOPS respectively then it does not mean storage system performs better with 4K block size than 8K block size. In fact performance is equal in both cases when block sizes are different and all other factors are kept constant.

  • Throughput

Throughput is the amount of data transferred in a unit of time and is measured in kilobytes per second and (KBps) or megabytes per second (MBps).

  • Latency

Latency is the time taken to complete IO request and usually measured in milliseconds (ms).

  • IO Workload

IO workload defines block size, read/write percentage and percentage of random/sequential access. 

  • Establishing Baseline

Establishing baseline is a process to define acceptable performance results on predefined hardware and software configuration. This requires executing different workloads and concluding on acceptable performance which is agreed by stakeholders. As the scalability acceptance criteria, you can also define how much performance degradation is acceptable with respect to the baseline.

Depending upon the test case and measurements required, hardware and software configuration should be defined. In this process, you will finalize number of storage nodes, number of volumes, number of CPUs and memory for virtual machines, volume size, cache size, etc.

For example, LUN or Volume scalability test might require 8 storage nodes in the cluster with 16 volumes created as baseline configuration. As the test is expected to verify volume scalability, count of storage nodes in cluster and configuration parameters other than number of volumes will remain constant in scaled configuration.

For node scalability, baseline configuration might require just 2 storage nodes. As system scales storage node count increases. Number of volumes and configuration other than count of storage nodes remain constant.

In case of volume scalability, IOPS in scaled configuration are compared with baseline configuration for various queue depths whereas for node scalability, IOPS comparison is performed for various nodes count.


3.     Finding scalability limits with some examples

When storage system scales in terms of number of storage nodes or number of LUNs/volumes etc. and capacity is almost full, you might not get expected performance. Monitoring is required at storage, compute, network and virtualization level.  You can finalize the list of workloads that need to be simulated depending upon the application you are going to use in production environment. The charts and graphs based on the periodically collected IOPS, throughput and latency are very useful to find out deviation in performance after storage system scales.

First step in finding scalability limits is to compare performance measured in baseline configuration and scaled configuration. Following two examples demonstrate performance deviation. However, first example shows that deviation is within acceptable limits whereas second example shows performance degrades significantly after system scales. The examples are explained with the help of IOPS, and latency against multiple queue depths, which are important parameters from scalability testing perspective. Please note that these graphs and results are for illustration purposes only.

In order to test the scalability of the system, storage cluster was prefilled to 90% of its capacity. Also, the LUNs / volumes are equally distributed across storage nodes which are member of clustered storage system. The setup configuration is explained in Table 1.

Workload configuration used in examples:

Example 1:       Block size = 8K, Read = 60%, Write = 40% and Access = random

Example 2:       Block size = 8K, Read = 00%, Write = 100% and Access = random

setupTable 1. Setup configuration


Figure 1 describes IOPS and latency comparison of baseline environment and scaled environment in which no scalability and performance issues have been found. At all queue depths, IOPS and latency in scaled environment are within acceptable range (+ /- 5% in this example) with respect to baseline environment. At higher queue depths, IOPS saturate and do not increase even if queue depth is increased. Throughput and IOPS are directly proportional to each other. The workload (8K random 60% read 40% write) scales well and meets scalability requirements.


Figure 2 describes IOPS and latency comparison of baseline environment and scaled environment in which scalability and performance issues have been observed.  At all queue depths, IOPS in scaled environment show significant degradation when compared with baseline environment. Also, latency in scaled environment is higher than baseline environment. The deviation measured in scaled environment is not within acceptable range (more than +/- 5% in this example) of baseline environment. The workload (8K random 100% write) does not meet scalability requirements or acceptance criteria. In these type of circumstances, we need to find out the bottleneck which limits the IOPS in scaled environment.


4.   Factors affecting performance

The list of factors which affect performance and scalability of storage system is listed below. 

  • Disk configuration

It is a well-known fact that SSD is a much faster media than HDD. As total number of drives in a storage pool are increased, performance also increases. These days 10K RPM HDD drives are common and will result in lower latency than latency of 7K RPM drives. Depending upon RAID configuration and level of virtualization, performance will vary therefore underlying hardware configuration and software configuration which virtualizes storage hardware need to be configured properly. In case of software defined storage, local disks and disks contributing to cluster or storage pool need to be connected to separate HBAs because local IOs should not be taken into consideration when measurements are taken at cluster level.

  • Caching

SSDs are used for caching. The size and number of SSDs used for caching will play a role in storage performance. To improve read performance, write-through or read-ahead caching is used whereas write-back caching is used to improve write performance. If you experience a sudden drop in write performance, it could be due to cache being 100% full. If the speed at which data being written is constantly high then even flushing will not help.  Statistics such as cache hits and misses need to be monitored. 

  • CPU and Memory Resources

It is necessary to monitor CPU and memory usage of hypervisors, virtual machines (VMs), storage nodes and servers in order to find bottleneck. The servers which perform IOs on SSDs might require more CPU resources than servers performing IOs on HDDs. In virtualized environment, total number of virtual sockets and cores per socket assigned to VMs are important settings from scalability point of view. There are other settings like CPU affinity, page sharing, memory ballooning, etc. We recommend reading of hypervisor related document for advanced CPU and memory configurations which are not in scope of this document.

  • Workload

In general, read operation works faster than write operation. On HDDs, sequential IOs perform better than random IOs due to high seek time for each block of random IO. Random write performance in SSDs is also slower than sequential write.

  • Background Operations

When performance measurements are taken, background operations like disk zeroing, RAID rebuilding/recovery due to disk failure/replacement, restriping due to storage node shutdown/reboot or change in the RAID level should be ceased.

  • Full Stroke

In full stroke, data read or written during performance test should be spread across all disks and all clustered storage nodes. Create or pick LUNs to perform IO in such a way that all disks of storage pool or cluster are used. Short stroke is performed on small portion of HDD and due to low seek time, short stroke might give you good performance (lower latency) result which can be an incorrect conclusion. Full stroke performance results are more reliable than short stroke results.

  • Multipathing

Multipathing policies configured on initiator side determine how IOs are distributed across multiple paths. For example, in case of dm-multipathing on Linux (multipath.conf), ‘path_grouping_policy’ will decide how many paths are used to transfer data and ‘path_selector’ will decide how to distribute data/IOs across paths.

  • Command queuing

Multiple SCSI commands can be active on a LUN at the same time. Queue depth is the number of commands that can be active at a time which is configurable at SCSI driver level. If hypervisor issues more commands than configured queue depth then queuing takes place at hypervisor level. Under normal circumstances, command issued to disk in the storage array is executed immediately. It is not recommended for hypervisor (VMs running on hypervisor) to consistently issue more commands than LUN queue depth. This might result in disk queuing on storage array as well.

  • Network Configuration

Sometimes incorrect network configuration contributes to degraded storage performance therefore industry best practices should be followed when it comes to keeping management network and data-path/VM network separate and VLAN configuration to control broadcast traffic. NIC teaming and TCP offload engine (TOE) supported NICs can be used to enhance iSCSI performance. When you decide to use jumbo frames, make sure that MTU of 9000 is configured on all network equipment.

  • Virtualized Environment

In virtualized environment, as far as storage performance is considered, thick provisioned and eager zeroed LUNs are expected to be created. Thin provisioned LUNs lead to incorrect performance results as disk zeroing takes place just before actual write.


5.     System stability in scaled environment

The continuing problem with storage system is how to deal with escalating requirements in a manageable, smooth and non-disruptive manner. Multiple storage admins work on large environments simultaneously and perform operations such as creating volumes, snapshots, assigning LUNs, etc. Multiple requests from multiple hosts come to the storage system at the same time when it scales to large extent therefore the reliability of the system should be maintained.

The level of concurrency defines how these incoming requests are distributed across storage nodes for processing. Test activity to verify efficiency of processing concurrent requests must be carried out. The response time for each request is measured after sending concurrent requests to storage nodes. The level of concurrency and response time for the requests should not degrade. Storage system should respond gracefully to all the requests.

Stability testing can be performed by perturbing the system. While IOs are being performed, shutting down, rebooting, removing or adding any of the nodes in the cluster are some tests that can be performed.

5.1.   Test automation and tools

Creating large setup repeatedly and creating load of concurrent requests require test automation and tools to be in place. In the absence of test automation and tools, human errors are often introduced and it leads to misleading results and increased time to prepare setup and execute test strategy.

6.     Conclusion

As storage system scales, it is important to find whether performance degrades. If at all, there is degradation in performance then extent of degradation needs to be known. Degradation can be determined and reduced by using a systematic approach of comparing with baseline results and constant monitoring of resources. System stability can be maintained when storage systems scales to large extent.



By Mahesh Kamthe and Shubhada Savdekar


Agiliad wins design contest at the 26th International Conference on VLSI Design and Embedded Systems!

Team Agiliad had participated in the design contest at the 26th International Conference on VLSI Design and 12th International Conference on Embedded Systems 2013, recently held in Pune. We had presented a novel solution in the design contest for a cost-effective and reliable estimation of fetus gestational age in the resource poor settings. The solution involved the measurement of symphysis-fundus height of a pregnant woman using an image processing based application built on Raspberry Pi, a $ 25 open-source computing platform, augmented with a mechanical frame for referencing the region of interest. The concept was highly appreciated at the conference and we also emerged as winners of the design contest. Following is a brief overview of the problem that we had identified along with the proposed solution. An illustrated presentation of the concept can be downloaded here!

Problem Addressed: Reliable estimation of fetus gestational age in resource poor settings

Estimation of gestational age of the fetus is an important clinical practice, crucial for monitoring the health of the mother as well as the fetus. The conventional method for this estimation is by ultrasonography. However, due to the lack of high end infrastructure in resource poor settings, this method is not practical. Another method for this estimation is the measurement of the symphysis-fundus height (SFH) using a measuring tape (shown in the figure below). This method has been approved to be suitable for rural settings. But because of the lack of proper training and documentation methods amongst the health workers at the primary level, process variations and errors in measurement are widely prevalent, leading to highly unreliable estimations. SFH measurement also facilitates in the early screening of macrosomia (excessive fetal weight), fetal growth retardation and multiple pregnancies. Hence, there is a definite need for a cost-effective technical solution to overcome the shortcomings of the manual measurement method and to help in multi-stage documentation of the procedure over the entire period of nine months.

Fundal Height Measurement

Solution Proposed: Measurement of symphysis-fundus height using Raspberry Pi image processing platform

The key drivers of the technical solution to address this problem are the following – low-cost, ease-of-use, and accuracy. The solution was derived from one of our in-house initiatives to leverage low cost open-source hardware to build a generic computing platform for diverse applications, ranging from building automation to point of care medical diagnostic devices. The present solution comprises a mechanical frame attached to the patient bed. The frame consists of three markers which are attached on top of three telescopic pillars facilitating in positioning the markers on the vertical plane. A web camera is also affixed onto the frame at certain distance from the patient bed. One of the markers is used to reference the camera from the patient by co-relating the diameter of the circular marker obtained on the image versus the actual known diameter. The other two markers are positioned at suitable points on the fundus across which the length of the curve has to be calculated. A schematic diagram of the experimental setup is shown below. A customized image processing algorithm is implemented on a Raspberry Pi computing platform consisting of the following key steps – marker detection, edge image conversion, boundary tracing and distance calculation. The Raspberry Pi is $ 25 open-source hardware based on an ARM 1176JZFS running at 700 Mz, with a Videocore 4 GPU (Bluray quality playback) in a Broadcom BCM 2835 SoC having 256 Mb RAM, 2 USB ports and an Ethernet port. The Raspberry Pi uses Linux kernel-based operating systems. The design and the algorithm were tested on a dummy model of the fundus and accurate length measurements were obtained. The next steps in this project consist of the following – testing the solution in a real clinical setting, building a mobile application on similar concept, and estimating the amniotic fluid index in a pregnant woman using the depth sensor of Microsoft Kinect.

Fundal Height

We wish to scout for more such elementary problems and come up with effective and innovative solutions to tackle them!

Storage Accelerators: Bridging Cloud Computing Storage I/O Bottleneck

It’s no secret that the cloud computing market has been growing rapidly both for public and private deployments directing hyper-scale infrastructure to store, process, and deliver accelerating data demands. To meet growing cloud application demands and cut infrastructure costs, public and enterprise clouds are increasingly using virtual machines (VMs) to consolidate applications onto few servers.

But in addressing the problem for under-utilized servers using Enterprise Virtualization the next biggest challenges that cloud computing is facing is how to solve the storage I/O bottleneck that comes with large-scale virtual machine (VM) deployments. As the number of Virtual Machines (VMs.) grows, the corresponding explosion in random read and write I/Os inevitably brings a network attached storage/storage area network (NAS/SAN) array or local direct attached storage (DAS) to its knees as disk I/O or target-side CPU performance become bottlenecked.

To work around these pains points, storage managers are adding capacity to their storage infrastructures to meet performance demand. Essentially they are trying to provide the storage system with access to enough hard disk spindles so that it can respond more quickly to the massive random I/O that these environments can generate. These “solutions” lead to racks and racks of disk shelves and very low actual capacity utilization which do not scale effectively in purchase cost, administration overhead and maintenance expenses (power, cooling, floor space) hence justifying the need for storage accelerators.

Options available and the propitious solution

This is where SSD’s are alluring the storage innovators .Although relatively low in capacity, solid state storage provides extremely high input/output per second (IOPS) performance that can potentially solve most storage I/O related challenges in the modern data center today.

Vendors like EMC, Marvell,QLogic ,NetApp and Dell are all attempting to develop solutions to bridge their customers to SSD .Following are the multiple ways in which SSD’s can be stacked in and their individual limitations:

Fixed Placement:

Fixed placement to solid state storage may be acceptable for certain workloads where specific subsets of data can be placed on SSDs, database application hot files (eg. indexes, aggregate tables, materialized views) being good examples. However it does not support a full complement of storage services (snapshots, replication, etc.) and many don’t have complete high availability options. Or the cost to implement high availability is simply too high ruling out this as a potential solution.

Automated Tiering:

Automated tiering works by moving sections of data to high performance storage as they become active and then demote them as they become less active. But for implementing this solution, the storage system must support automated tiering which may require upgrading to new storage infrastructure. Secondly, depending on the size of the sections of data to be promoted the time it takes for the storage controller to analyze data access patterns and start promoting data to the SSD tier can delay the time to ROI by days or weeks. And the third and considerable limitation to this is option is the wear-out of SSD’s because of write amplification due to constant reading and writing of large chunks of data.

Cache Appliances:

To alleviate some of these issues several third party manufacturers have created external caching appliances. These systems sit in line between the servers accessing storage and the storage itself. In other words all traffic must flow through the devices .This solution does create a broad caching tier for the environment providing a high performance boost to more storage, but it may be too broad, since all data going through these devices may not be appropriate for caching. And because of their inline nature solid state caches also are vulnerable to the performance limitations of the storage network and the storage controller. Finally, the inline caching appliance itself can become a limiter to scale and be overrun when many application servers are channeling storage I/O through the caching appliance.

Limitations to OFF-SERVER solutions available

All the above three solutions mentioned above do not actually improve the performance of the storage network or the storage controller, in fact they often expose its shortcomings.

And they also ignore the fact that the device needing access to the storage I/O performance boost is the application server or virtual host. This is where Server Based Storage Accelerators are creating a charm.

Server Based Storage Acceleration

Server Based Acceleration via caching takes the concepts of the cache appliance and moves them into the server, typically via a PCIe card. This provides several significant advantages. First, the problem is being fixed closer to the source (the application or hypervisor) and cached I/O does not need to traverse the storage network. Secondly instead of deploying something universally to solve a specific problem it makes the solution most cost-effective by deploying the solution discreetly to servers only where the problem exists and need more performance efficiency as compared to other relatively less brimmed servers.

Variations to Server Based Caching Accelerators:

Software Based Server accelerators:

This category of software for virtualized servers are where caching decision is made on the host much closer to the source than either the caching appliances or disk arrays.

Leading the chase is FusionIO’s ioTurbine an Application level Caching software; in which caching software runs in the background as a component in the hypervisor and in the guest operating system and caching decision is made in the guest OS and not the HOST OS , right where the application is generating the data hence enabling accelerated performance directly to virtualized applications that require it.

What helps FusionIO’s ioTurbine outperform and provide low latency and high IOPS caching solution is the ioDrive’s VSL layer. With VSL, the CPU directly interacts with the ioDrive as though it were just another memory tier below DRAM which otherwise would require to serialize the access through Raid Controller and embedded processors resulting in unnecessary context switching, queuing bottlenecks eventually leading to high latency.

The Virtual Storage layer (VSL) virtualizes the NAND flash arrays by combining the key elements of Operating Systems namely I/O subsystem and Virtual Memory Subsystem by using “block tables” that translate block addresses to physical ioDrive addresses which is analogous to Virtual Memory subsystem. With VSL these “block tables“are stored in host memory compared to the other solid state architectures that store “block tables” in embedded RAM and hence have to pass through the legacy protocols.

However such software’s consume the host resources such as CPU and memory for flash management tasks like wear leveling, garbage collection, and such which are heavy users of the CPU that by all rights should be dedicated to serving the application.

Secondly currently few other caching software offerings are not integrated with solid state devices. The software is purchased from one vendor and the solid state from a different vendor. This typically leads to questions and issues: Do the software and the solid state device work well together?

Hardware Based Server accelerators

Differentiating from software based accelerators, hardware based storage server accelerators (SSA’s) may take the form:

  1. Integrated storage adapters i.e. HBA or NIC enabled caches
  2. Solid state PCIe devices/ SAS or SATA SSD devices.

Solid state PCIe Adapter Devices:

PCIe Adapter Storage accelerators usually comprising of SSD’s , DRAM, embedded firmware’s work by intercepting , redirecting IO to high speed local storage(SSD’s), and accelerating the IO .This requires a combination of tightly coupled IO interaction layer and an innovative hardware layer, built around a special purpose ASIC.

There intermediation is not elementary and it is only available today because of intersecting trends around operating systems, virtualization, consolidation, and processing power that have enhanced the ability of vendors to interact with storage IO paths. Today we have a much better IO stack to interact than ever before, irrespective of the operating system, application, or hypervisor under consideration and hence making Server based acceleration possible. An additional unique feature is its ability to create a host I/O cache that is agnostic to all network and local-attached storage protocols. It can be configured to serve as a data cache for DAS, SAN or NAS storage arrays, irrespective of protocols such as iSCSI, SAS or NFS that are used to access the data storage.

Leading the innovation are well known vendors like Marvell with DragonFly, EMC with VFCache.

Marvell’s DragonFly enables the creation of a next-generation cloud-optimized data center architecture where data is automatically cached in a low-latency, high-bandwidth “host I/O cache” in application servers on its way to/from higher-latency, higher-capacity data storage. A unique differentiator for Marvell DragonFly is its use of a sophisticated NVRAM log-structured approach for flash-aware write buffering and re-ordered coalescing. Unlike writing to SSDs that quickly degrade after a certain number of random writes, DragonFly ensures consistently high performance and low-latency with zero write performance degradation over time.

On the other hand EMC’s VFCache accelerates reads and protects data by using a write-through cache to the networked storage to deliver persistent high availability, integrity and disaster recovery.

VFCache coupled with array based EMC FAST technology on EMC storage arrays can help place the application data in right storage tier based on the frequency with which data is being accessed. VFCache extends FAST technology from storage array to server by identifying the most frequently accessed data and promoting it into a tier that is closest to the application.

All and all VFCache is a hardware and software server caching solution that aims to dramatically improve your application response time and delivers more IOPS.

Integrated storage adapters (HBA or NIC enabled caches):

In the current market the first and the sole to lead the race in Integrated storage adapters i.e. HBA or NIC enabled caches, is QLOGIC’s Network Based Adapter Mt.Rainier .

Mt.Rainier is a combination of enterprise Server I/O Adapter, flash/SSD adapter, optimized driver and onboard firmware intelligence .This enhanced network HBA captures all I/O seamlessly, redirects it to flash media attached to PCIe flash storage.

In future, such accelerators will be ready to deploy in the infrastructure with no additional software required could eventually become the defacto HBA/CNA.



Jeff Boles. (2012). Server-Based Storage Acceleration. Available: Last accessed 24th oct 2012

George Crump. (2011). What is Server Based Solid State Caching?. Available: Last accessed 24th OCT 2012

Jeff Boles. (2012). Storage Performance – Maybe It Never Was the Array’s Problem. Available: Last accessed 24th oct 2012.

Arun Taneja. (2012). EMC announces PCIe Flash Cache—Fusion IO gets its first major competitor. Available: Last accessed 24th oct 2012.

EMC CORPORATION. (2012). INTRODUCTION TO EMC VFCACHE.Available: Last accessed 24TH OCT 2012.

Taneja Group. (2012). BRINGING SERVER BASED STORAGE O SAN. Available: Last accessed 24th oct 2012.

SHAWN KUNG. (2012). Breaking Through Storage I/O Barrier For Cloud Computing.Available: Last accessed 24th oct 2012

Victoria Koepnick. (2012). Not All Caching is Created Equal. Available: Last accessed 24th oct 2012.

Focus: Hadoop (Part 1)

A google trend graph created on “Hadoop” and related technologies shows an interesting scenario. The interest over time related to web searches for Hadoop has steadily increased and continues to increase over time. It seems as if “Hadoop” and “Big Data” are replacing “Data mining” as keywords. Hadoop has aided in Big-data analytics that is a buzz-word everywhere these days. What was “Big” a few years back seems very “small” now. “Big” keeps becoming “Bigger”. Hadoop enables us to bridge the gap.





This brief article (and Part 1 in the series) talks about Hadoop at an overview level, it’s history, the technology and future trends.

Hadoop is not new, the underlying technology is used by Google for web indexing, is used by organizations world-wide for Big-data analytics. It is in fact even used by Mars “rover” mission to aid in determining if life ever existed on Mars. It’s the sheer volume of data that needs to be handled where Hadoop shines through it cluster based distributed system.

In finance, if you want to do accurate portfolio evaluation and risk analysis, you can build sophisticated models that are difficult to put into a database engine. But Hadoop can handle it. In online retail, if you want to deliver better search answers to your customers so they’re more likely to buy the thing you show them, that sort of problem is also well addressed by Hadoop.

Hadoop is an open source project from Apache that has evolved rapidly into a major technology movement. It has emerged as the best way to handle massive amounts of data, including not only structured data but also complex, unstructured data as well.

Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open source web search engine, itself a part of the Lucene project. The name Hadoop is not an acronym; it’s a made-up name. The project’s creator, Doug Cutting, explains how the name came about:


“The name my kid gave a stuffed yellow elephant. Short, relatively easy to spell and pronounce, meaningless, and not used elsewhere: those are my naming criteria. Kids are good at generating such. Googol is a kid’s term.”

The underlying technology was invented by Google back in their earlier days so they could usefully index all the rich textural and structural information they were collecting, and then present meaningful and actionable results to users. There was nothing on the market that would let them do that, so they built their own platform. Google’s innovations were incorporated into Nutch, an open source project, and Hadoop was later spun-off from that. Yahoo has played a key role developing Hadoop for enterprise applications.

Simply put, Hadoop provides: a reliable shared storage and analysis system. The storage is provided by Hadoop Distributed File System (HDFS) and analysis by MapReduce algorithm. These are the main kernel components of Hadoop. However, Hadoop also has several other components like:

  • Hive (queries and data summarization)
  • Pig (processing large data sets)
  • HBase (column oriented NoSQL data storage system)
  • ZooKeeper (co-ordinating processes)
  • Ambari (administration)
  • HCatalog (meta data management service)

HDFS is a filesystem designed for storing very large files reliably with streaming data access patterns, running on clusters of commodity hardware. As the name implies, HDFS is a distributed filesystem, and hence has all the complications of network based filesystems like consistency, node failures, etc. However, by distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size. It’s designed to run on clusters of commodity hardware.

MapReduce is a framework for processing “embarassing parallel” problems across huge datasets using large number of computers. It uses locality of data effectively to reduce transmission of data between nodes. As the name implies, it consists of two steps: Map and Reduce. “Map” divides the problem into subproblems and distributes it across cluster of nodes, while “Reduce” collects the answers from all the nodes in the cluster and merges the results. MapReduce is not specific to Hadoop and it has been applied in different schemes for other solutions. For example, at Google, MapReduce algorithm was used to completely regerenate Google’s index of the World Wide Web.

The premise of MapReduce is that the entire dataset—or at least a good portion of it—is processed for each query. But this is its power. MapReduce is a batch query processor, and the ability to run an ad hoc query against your whole dataset and get the results in a reasonable time is transformative. It changes the way you think about data and unlocks data that was previously archived on tape or disk. It gives people the opportunity to innovate with data. Questions that took too long to get answered before can now be answered, which in turn leads to new questions and new insights. This enables solutions like big data analysis.

Hadoop is designed to run on a large number of machines that don’t share any memory or disks. That means you can buy a whole bunch of commodity servers, slap them in a rack, and run the Hadoop software on each one. When you want to load all of your organization’s data into Hadoop, what the software does is break that data into pieces that it then spreads across your different servers. There’s no one place where you go to talk to all of your data; Hadoop keeps track of where the data resides. And because there are multiple copy stores, data stored on a server that goes offline or dies can be automatically replicated from a known good copy.

Despite all the advantages provided by Hadoop, there are use case scenarios where Hadoop does not serve well. Such use cases include scenarios where we have:

  • Low-latency access
  • Lots of small files
  • Multiple writers, arbitrary file modifications

Quantcast recently announced Open-Sourcing of their Quantcast File System (QFS) that claims to provide better through-put than HDFS. It will be interesting to study how the two compare in performance tests. But, Quantcast isn’t the only company that has replaced HDFS. MapR‘s commercial distribution of Hadoop uses a proprietary file system. DataStax Enterprise uses Apache Cassandra to replace HDFS.

Over next parts in this series, we shall talk about Hadoop components in more detail.



Hadoop: What it is, how it works, and what it can do

What is Apache Hadoop?

Trends in Big Connectivity: Big Data, Hadoop and Life on Mars

Quantcast Open Sources Hadoop Distributed File System Alternative

Hardware mobile apps – making smart phones ‘medically’ smarter!

In the era of smart phones and mobile gadgets becoming smarter day by day, it would not require a lot of effort to intuitively assess their ‘smartness’ for innovative medical applications. Mobile apps for conventional medical alerts, reminders, health parameters monitoring (blood sugar, blood pressure, BMI etc) have been in widespread use since a long time. Voxiva, a Washington D.C. based company provides mobile health-coaching programs which target a wide variety of users, including pregnant women, diabetics, and smokers. SpiroSmart is a recent innovative iPhone app which enables the measurement and analysis of conventional lung function parameters. However, applications based on mobile devices has reached an altogether new dimension with the rapid development of innovative ‘mobile hardware apps’ for diverse medical use. These pieces of hardware are used in conjunction with a conventional smart phone as potential medical diagnostic devices. Let us take a closer look at some of the most interesting (and technologically stimulating!) hardware mobile apps –


Netra is a solution proposed by the Camera Culture Group at MIT. It is an inexpensive mobile hardware app which is based on an inverse Shack-Hartman sensor for the estimation of refractive errors in the human eye. The key idea is to interface a lenticular view-dependent display with the human eye at close range just a few millimeters apart.

Image Source: Camera Culture Group, MIT Media Labs


The OScan team at Stanford University has developed an affordable screening tool that brings standardized, multi-modal imaging of the oral cavity into the hands of rural health workers around the world, allowing individuals to conduct screenings for oral lesions. This inexpensive device mounts on a conventional camera phone and allows for data to be instantly transmitted to dentists and oral surgeons. OScan aims to empower minimally-skilled health workers to connect early stage patients to health care providers and teach communities about the importance of oral hygiene.


Mobisante, a Redmond based company has developed a mobile ultra sound system (MobiUS) which includes a Toshiba Windows Mobile-powered smart phone, ultrasound probe, and the accompanying Mobisante software. The exams include “Quick Scan”, a general purpose setting, AAA, FAST, Cardiac, OB, Pelvis, Vascular and small organs.

e-Petri Dish

With the ePetri Dish system, scientists no longer have to remove the cells from the incubator but can simply look at the laptop images. Less manipulation makes for better cell health and reduced risk of contaminating them. With the ePetri system, cells are grown on a CMOS image sensor – the kind found in common digital cameras. A smartphone placed above the sensor provides – via a commercially available app – a scanning spot of light that sweeps back and forth across its LED screen.


It is a non-intrusive Bluetooth enabled device that connects to a glucometer and transmits data to a mobile phone. The Diabeto device can transmit to any diabetes mobile application. The Diabeto app will also have multiple utilities that can check your blood sugar levels, give history, suggest diet, notify the physician etc.


The RVA Smart-clamp is a universal endoscope adapter which enables pictures and video to be taken with a mobile phone camera. The app is unique in the sense that it is a purely mechanical device which helps the surgeon in the real time viewing of endoscopic images with great ease.


SmartHeart is a gadget that turns a mobile phone into a powerful medical tool able to detect heart problems. It connects to, and converts, a smartphone into a hospital-grade heart monitor capable of performing electrocardiograms in just 30 seconds. The device hooks around the user’s chest and records their heart rate by measuring its electro-activity.

Image source: SHL Telemedicine


CellScope‘s clip-on otoscope helps pediatricians increase the standard of care by creating a visual history of the middle ear and allows parents to save time by allowing ear infections to be diagnosed and treated remotely. Also, CellScope’s innovative clip-on dermascope enables patients to capture and transmit high-magnification, diagnostic-quality images of the skin from the privacy and convenience of their own homes.


Flow cytometry is a technique for counting and examining cells, bacteria and other microscopic particles. Researchers at the BioPhotonics Laboratory at the UCLA Henry Samueli School of Engineering and Applied Science have developed a compact, lightweight and cost-effective optofluidic platform that integrates imaging cytometry and florescent microscopy and can be attached to a cell phone. The resulting device can be used to rapidly image bodily fluids for cell counts or cell analysis.
Image source:

Adoption of Multi Core processors for industrial applications – Opportunities and Challenges

While the semiconductor industry has not been able to keep pace with the Moore’s law since 2006, the increase in chip frequencies has brought in new challenges in terms of power consumption. This has led to the evolution of Multi core processor technology (MCPs), which has already made a significant mark in the desktop computer market with all major semiconductor companies producing processors with 2, 4 and even up to 16 processing cores.

Multi-core processor technology has opened up new avenues in other areas as well and one domain area that has started adopting the technology significantly is the Industrial Automation and Robotics area. With a parallel evolution on the Operating System and Application software side for industrial applications, various control devices like PLCs, Micro-controllers and Human Interface Devices can be combined to run on single board platform based solution, which was something difficult to do with single core architectures. With varied software configurations that are possible, MCP architecture can give users a great deal of choice and flexibility like e.g. one of the cores can be dedicated to a complex process or critical functionality like a safety module or a redundancy module while the other core is available for non-critical operations.

Though in theory multiple cores would enhance the overall computing performance of the platform, realizing the potential of multi-core processing poses a significant challenge to software designers. In order to realize the benefits of MCPs the programmers must strive for absolute parallelism and at the same time not compromising on the real time determinism of the applications.

There are two software configurations that are possible with MCPs, Symmetric Multi Processing and Asymmetric Multi Processing. With a single operating system managing all the cores and scheduling the tasks between cores, SMP can assure users absolute parallelism provided the application is split into multiple threads. This objective brings to the fore the issue of redesign all existing applications to use thread affinity and multi threading constructs. The programmers have to be trained towards this perspective of concentrating on parallelism, which they are not used to in single core architectures. Also, while SMP architecture provides enhanced performance if the parallelism is exploited adequately, it may have a potential to adversely impact the Real Time Determinism, which can be crucial in Real Time Systems.

In Asymmetric Multi Processing platforms, multiple operating systems run simultaneously in the system, one for each core. The hardware peripherals are distributed between Operating systems. Since each OS manages only one core there is hardly any need to redesign applications allowing ease in portability from single core to multi-core platforms. AMP also ensures real time determinism being design equivalent to a single core architecture having only one core to schedule tasks. However, AMP does have limitations in terms of the parallelism that it can exploit on a multi-core setup. This is inherently due to the fact that the Operation system running on one core may not know if other cores are idle and cannot schedule tasks for other cores.

With each configuration having its benefits and limitations, choice of the configurations entirely depends on the nature of the application. With MCPs having more than two cores a hybrid configuration is also possible where both SMP and AMP co-exist, like in a quad core a single core can be configured to AMP to run critical task and ensure Real time determinism and other three cores are configured to run in the SMP mode.

The automation industry is slowly adapting MCPs with higher-end controllers first, followed by lower end controllers as costs come down. In the lines of software evolution there is also need for evolution of associated tools like compilers and debuggers to enable best use of MCP platforms. While there are debuggers that can debug and visualize the multi threading in true sense with interaction between threads and compilers that can map application code to specific core, reducing the efforts of the programmers, there is still a lot more to do as far as leveraging MCPs for critical industrial automation platforms.

References and Recent Updates:

Spotlight: WirelessHART – Wireless Solution for Sensor Networks in process industry

On 17th of September 2012, AutomationWorld reported the unveiling of Emerson’s IEC 62591 compliant WirelessHART interface for use with its remote terminal units. Emerson has targeted this interface at upstream Oil and Gas applications and believes that the WirelessHART should make the sensing network extremely flexible without compromising on the communication reliability. While this news of a process controls giant taking a leap of faith as far as adoption of wireless networking for building critical sensor networks may seem a big step in the process industry setups, for some of us who have been following this evolution, especially that of WirelessHart, aren’t very surprised.

From its first release in 2007 to now, there has been a terrific momentum of its adoption and in one direction. While one of the principal driving forces behind the protocol has been the process giant Emerson, there are others like ABB, E+H and Nivis who have joined hands to build products that are WirelessHart based. The phenomenal growth is also fuelled by the proliferation of wireless sensor networks by the process industry and while there is a competing standard by ISA (100.11a) which is marketed as a future proof standard, WirelessHART because of millions of existing connected HART based devices is growing very fast. More than 8,000 WirelessHART networks are currently installed in major manufacturing sites around the globe, tripling the number of devices from 12 million to about 35 million in the last 2 years, signifying the acceptance of the WirelessHART standard by the process automation industry.

What is WirelessHart and how does the protocol enable reliable industrial grade wireless communication?

WirelessHart is a wireless sensor networking technology based on Highway Addressable Remote Transducer Protocol (HART) and uses IEEE 802.15.4 compatible radios operating in the 2.4GHz ISM band employing direct sequence spread spectrum technology and channel hopping for communication security and reliability, as well as TDMA synchronised, latency-controlled communications between devices on the network. Each device in the mesh network can serve as a router for messages from other devices extending the range of the network and provides redundant communication routes to increase reliability. The Network Manager determines the redundant routes based on latency, efficiency and reliability. To ensure the redundant routes remain open and unobstructed, messages continuously alternate between the redundant paths.

If a message is unable to reach its destination by one path, it is automatically re-routed to follow an established redundant path without data loss. WirelessHART supports multiple messaging modes including one-way publishing of process and control values, spontaneous notification by exception, ad-hoc request/response, and auto-segmented block transfers of large data sets. These capabilities allow communications to be tailored to application requirements thereby reducing power usage and overhead.

What makes the WirelessHART protocol a promising technology?

  • First up, it is built on a solid HART standard foundation, ensuring that it addresses the basic challenges regarding handling process measurement and control problems. Also starting out with an established protocol reduces the risk of unforeseen problems with the technology or the development process.
  • The HART protocol fundamentally supports on-demand communication as it is needed, making it a good choice for wireless applications where long battery life is important against most other bus protocols which require continuous communications that drain batteries quickly. It also permits selection of the power option that best meets application needs. Example options include long-life batteries, solar power, line power, and loop power. Other measures that are used to reduce communication overload are Smart Data Publishing and Notification by Exception.
  • The onboard diagnostics in millions of installed HART devices mostly go unused because their host systems can’t access digital HART data. WirelessHART adapters unlock this ‘trapped’ data by providing a new communication path to asset-management systems, historians or other tools.
  • WirelessHART includes several features to enhance reliable communications;
    • Redundant mesh routing (space diversity): WirelessHART mesh topology with self organising and self-healing characteristics where if there is interference or other obstacles interrupt a communication path, the network immediately (and automatically) re-routes transmissions using path optimised, redundant mesh topology.
    • Channel hopping (frequency diversity): WirelessHART ‘hops’ across the 16 channels defined by the IEEE 802.15.4 radio standard to overcome interference in the ISM band. Automatic clear-channel assessment before each transmission and channel blacklisting may also be used to avoid specific areas of interference and minimise interference to others.
    • Time synchronised communication (time diversity):  All device-to-device communication is done in a pre-scheduled time window, which enables collision-free, power-efficient, and scalable communication. Each message has a defined priority to ensure appropriate Quality of Service (QoS) delivery. Fixed time slots also enable the Network Manager to create the optimum network for any application without user intervention.
    • Additional techniques such as DSSS technology (coding diversity) and adjustable transmission power (power diversity) also help WirelessHART provide reliable communication even in the midst of other wireless networks
  • WirelessHART employs robust security measures to ensure the network and data are protected at all times. These measures include:
    • 128-bit encryption prevents sensitive data from being intercepted
    • Verification where Message Integrity Codes verify each packet
    • Key Management where rotating keys can prevent unauthorised devices from joining or communicating on the network
    • Authentication ensures that devices aren’t allowed onto the network without authorisation


Lights, Sound and Magnetism – the science behind next-generation medical technologies!

It was often hard to imagine the far-fetched applications of basic physics when topics as humble as Acoustics, Optics and Magnetism were introduced in our high school physics textbooks. And it seems enthralling now to fathom how some of these basic disciplines have been applied for the development of some of the most sophisticated medical technologies of today’s world. Out of this fascination we decided to have a look at some of them briefly –

Optical Coherence Tomography

Optical coherence tomography (OCT) is an emerging technology for performing high-resolution cross-sectional imaging. OCT is analogous to ultrasound imaging, except that it uses light instead of sound. OCT can provide cross-sectional images of tissue structure on the micron scale in situ and in real time. OCT can function as a type of optical biopsy and is a powerful imaging technology for medical diagnostics because unlike conventional histopathology which requires removal of a tissue specimen and processing for microscopic examination, OCT can provide images of tissue in situ and in real time. By using the time-delay information contained in the light waves which have been reflected from different depths inside a sample, an OCT system reconstructs a depth-profile of the sample structure. Three-dimensional images can then be created by scanning the light beam laterally across the sample surface. Lateral resolution is determined by the spot size of the light beam whereas the depth (or axial) resolution depends primarily on the optical bandwidth of the light source. For this reason, OCT systems may combine high axial resolutions with large depths of field, so their primary applications include in-vivo imaging through thick sections of biological systems, particularly in the human body. The figure below shows a comparison of OCT resolution and imaging depths to those of alternative techniques; the “pendulum” length represents imaging depth, and the “sphere” size represents resolution (image source – UWA).

Ultrasound Elastography

Elastography is based on the principle of physical elasticity which consists of applying a pressure on the examined medium and estimating the induced strain distribution by tracking the tissue motion.  It uses the visualization of the propagation of mechanical waves through the tissue to derive either a shear wave velocity or a Young’s modulus as a measure of tissues stiffness.  In practical terms, RF ultrasonic data before and after the applied compression are acquired and speckle tracking techniques, e.g., cross correlation methods, are employed in order to calculate the resulting strain. The resulting strain image is called an elastogram. The primary goal of elastography was the identification and characterization of breast lesions. To acquire an elastography image, the ultrasound technician takes a regular ultrasound image and then pushes on the tissue with the ultrasound transducer to take a compression image.  Normal tissue and benign tumors are typically elastic or soft and compress easily whereas malignant tumors do not depress at all. The image below shows a traditional ultrasound image and a corresponding real-time elastogram of an ablated lesion in an ex vivo liver. In the elastogram, blue corresponds to hard tissue and red corresponds to soft tissue. The lesion is not clearly visible in the traditional ultrasound image because the ablation process does not change the echogenicity of the tissue significantly. However, the lesion is clearly visible in the elastogram (dark blue area) because the ablation process hardens the tissue significantly. Image Source-TAMUS.


Magnetoencephalography (MEG) is a non-invasive technique used to measure magnetic fields generated by small intracellular electrical currents in neurons of the brain. It allows the measurement of ongoing brain activity on a millisecond-by-millisecond basis, and it shows where in the brain activity is produced. MEG measurements are conducted externally, using an extremely sensitive device called a superconducting quantum interference device (SQUID). The SQUID is a very low noise detector of magnetic fields, which converts the magnetic flux threading using a pickup coil into voltage allowing detection of weak neuromagnetic signals. Since the SQUID relies on physical phenomena found in superconductors it requires cryogenic temperatures for operation. Due to low impedance at this temperature, the SQUID device can detect and amplify magnetic fields generated by neurons a few centimeters away from the sensors. A magnetically shielded room houses the equipment, and mitigates interference. Applications of MEG include localizing regions affected by pathology before surgical removal, determining the function of various parts of the brain, and neurofeedback.

Watch out this space as we deep dive into some of these technologies in greater detail and explore the rapidly evolving medical technology landscape!

Is it high time to innovate for brown-field markets?

Green-field opportunities have traditionally been the focus area for innovators. An opportunity for them to demonstrate how things can be made to work better, look better and do things that were difficult to imagine. This is especially true for installations involving heavy infrastructure. It is significantly easier to adapt new ideas, concepts and products when you are setting up a new plant or process from scratch. On the other hand, there is inherently a high inertia towards trying something new in brownfield installations.  This is because brown-field installations were originally designed for particular modes of production, with established practices and technologies, incumbent customers and competitors, supporting and specialized infrastructure, deep-rooted business relationships, and sometimes extensive government regulation. This reality has dissuaded potential “brown field” innovators, especially in the automation OEM market.

Things have changed. With the economic downturn and a paucity of green-field opportunities, industrial product OEMs are, albeit reluctantly, looking to find some opportunities in existing installations. They are not finding it easy, however, and especially for emerging markets, they are really struggling.

There are a few aspects to the challenge of innovating in brown-field markets.  First, the innovation has to fit and co-exist with the existing technical infrastructure. Sometimes the interoperability problems can be really overwhelming and overshadow the benefits.  Unless strongly supported by economics (ROI) and a strong intent, this alone can stall innovation.  Consider the case of someone trying to innovate HVAC control systems to make them energy efficient in brown-field buildings (in India).  The reality is that the engineering is so non-standard that it is impossible to think of a one-for-all solution.  New products and processes, designed for mature, established markets, must be gauged in terms of their overall potential in order to fit within the complementary systems that make up the rest of the infrastructure.

Second, Economics plays a even more significant role in brown field innovation. Benefits are incremental in most cases and ROI terms tend to be longer. There are efficiencies built over a long period of time in running plants in a certain way – which is tuned to optimum.  The benefits of having a trained work force and existing physical assets often used well beyond their amortization—thus providing incumbent competitors with extremely favourable economic terms.  Anything changing this optimized environment must provide especailly compelling advantages.

Third, and perhaps most signficant, there is the human factor of resistance to change and unwillingness to take risks in operational installations. Most operations managers tend to be production focussed and do everything that they can to maximize production, sometimes sacrificing long term efficiency and cost, and taking a short and mid-term view only.

Is there a reasonable method to ensure that sustainable and economically viable innovations are possible?  Can one really make a difference? Is there a business case?  We believe there is indeed a business case and product innovators have to take some realistic bets.

  • For a starter, one should be ready to get their hands dirty; it is not enough to model and design solutions sitting in air-conditioned development centres. Problems have to be understood closer to where they are happening and solutions thought of accordingly.
  • More often than not, there is not a one-size fits all solution for all problems, even the ones that look similar. One the ground, existing systems may not be standards used and there may be inter-operability issues. Engineering must be done based on the use case and for the purpose.
  • While the innovation process may replace some of the engineering systems and processes, the change must avoid disturbing the core operations. This will help in easier adoption and avoiding change related issues.
  • Brownfield innovation, more often than not, demands local presence and closer ties with an ecosystem that understands and supports the need closely.

Concept Realization Accelerated – Is Open Source Hardware for real?

A lot of us grew up programming on proprietary closed platforms and were firm believers that that was the way serious products are built. It would be an understatement to say that most of us were proved wrong about our (mis)conceptions about the power of community. Younger software developers can always claim that they always knew that Linux and open source software was the way to go. Hardware guys, at least we thought, take their skills more seriously and would not be drawn to something similar … never.  At least that’s what we thought till Massimo Banzi told us that things could be made simpler in the electronics world as well … by allowing the proliferation of low cost open source hardware platforms and enabling thousands of innovators to experiment, without having to worry about a very expensive hardware design process.

Manzi co-founded what is now very popularly known to product innovators as the Arduino project, a cheap, easy to use, open source, hardware platform.  Next time you have to do your own small control system in your lab, don’t bother designing your PCB … Arduino (and a few other ready to use open source hardware platforms) is all that you need.

Focus on the ideas … concept realization made cheap and easy.

While Arduino is without a doubt the poster child of the open source hardware movement there are others too … BeagleBone is another open-source single board computer that runs Linux. Because it’s a computer you can program your tests in any programming language you like, from C to the command line. Python, also open source, seems to be the most popular language for BeagleBone. Then there is the Raspberry Pi, which is a credit-card sized computer that plugs into your TV and a keyboard.  It’s a capable little PC which can be used for many of the things that your desktop PC does, like spreadsheets, word-processing and games. It also plays high-definition video. The developers want to see it being used by kids all over the world to learn programming.

And so on…

Most of these platforms have evolved to the extent that there are a wide variety of daughter board designs available that provide a wide array of interfacing capabilities.

While this movement might seem like one for hobbyists, there’s a larger world out there. Open source hardware enthusiasts will tell you that this will quickly prove to be a tremendous business driver enabling companies to move faster and be more agile than ever. Open Source hardware is a way of accelerating innovation.

Next time you hear something called Razdroid or the Android ADK (latter one released by Google … now that’s cool), ignore it at your own peril … this is Android on your $30 board.