Running Hadoop on a Raspberry Pi 2 cluster


Resource link

I’ve been included with cluster computing ever considering the fact that DEC released VAXclusters in 1984. In these times, a 3 node VAXcluster value about $1 million. These days you can make a considerably additional effective cluster for below $1,000, like considerably additional storage than any one could pay for again then.

Hadoop is the open-resource version of Google’s Map/Minimize and Google File Method (GFS), commonly made use of for massive info-crunching programs. It is a shared-almost nothing cluster, which indicates that as you include cluster nodes, overall performance scales up effortlessly.

In the paper, General performance of a Reduced Price Hadoop Cluster for Impression Examination, researchers Basit Qureshia, Yasir Javeda, Anis Kouba, Mohamed-Foued Sritic, and Maram Alajlan, developed a 20 node RPi Product 2 cluster, brought up Hadoop on it, and made use of it for surveillance drone picture investigation. They also benchmarked the RPi cluster in opposition to a 4-node Laptop cluster based mostly on 3GHz Intel i7 CPUs, just about every with 4GB of RAM.


The 20 node cluster was divided into four, 5-node subnets, just about every attached to 16 port switches that are, in flip, networked to a managed 24 port main change. The added change ports enable uncomplicated cluster enlargement.

Each 700MHz RPi B runs Raspbian, an ARM-optimized version of Debian Linux. Each RPi has a Class 10, 16 GB SD card able of up to 80MB/s read through/publish speeds. An picture of the OS with Hadoop 2.6.2 was copied on to the SD cards. The Hadoop Master node, which implements the title-node only, was installed on a Laptop jogging Ubuntu 14.4 and Hadoop.

General performance benefits

You would expect a cluster of 64-little bit, 3GHz x86 CPUs to be considerably a lot quicker than 700MHz, 32-little bit ARM CPUs, and you would be ideal. The staff ran a collection of assessments that had been a) compute-intensive (calculating Pi), b) I/O intensive (document phrase counts), and, c) the two (massive picture file pixel counts).

Here’s the phrase rely benefits, taken from a figure in the paper.


Courtesy of the authors

In common, the x86 cluster was 10-20 situations a lot quicker. However, the potential to place a Hadoop cluster in a backpack with a battery, opens up opportunities for effective edge computing, these as the drone video clip pre-processing the authors investigate in their paper. Also, these days we have the RPi Product 3, with a processor with virtually double the clock velocity of the RPi analyzed by the researchers.

The Storage Bits just take

Cellular edge clusters usually are not a point these days, but they will be, because our potential to assemble info at the edge is expanding considerably a lot quicker than community bandwidth to the edge. We’ll have to pre-system, for case in point, IoT info to compact it for community transmission.

When will they be economically viable? Three points have to happen:

  • Cellular processors have to get a lot quicker, even though remaining ability economical.
  • Far more ability economical memory – whether reduced-ability DRAM, or NVRAM – need to enable much larger memory cacacities on mobile processors.
  • Universal Flash Storage (UFS) guidance on mobile processors, getting rid of the latest storage bottleneck of micro-SD cards.

All 3 will happen in the following five many years. Then backpack clusters will be able of true do the job out in the wild.

Courteous remarks welcome, of system.


Please enter your comment!
Please enter your name here