Wednesday, May 1, 2013

How to install MapR M3 on Ubuntu through Ubuntu Partner Archive.

In a recent post of mine I had mentioned about the partnership between MapR and Canonical towards an initiative to make Hadoop available with Ubuntu natively through Ubuntu Partner Archive. Since, the package has been released now, I thought of showing how to get it done. Trust me it's really cool to install Hadoop by just one apt-get install :)

First things first. Open your sources.list file and add the MapR repositories into it.

deb mapr optional
deb binary/

Now, update your repository.
sudo apt-get update

Note : If it throws any error regarding MapR repositories, just uncomment the lines which allow us to add software from Canonical's partner repository.

## Uncomment the following two lines to add software from #Canonical's
## 'partner' repository.
## This software is not part of Ubuntu, but is offered by #Canonical and the
## respective vendors as a service to Ubuntu users.
deb precise partner 
deb-src precise partner

Install hadoop.
sudo apt-get install mapr-single-node

1, 2, 3..and you are done. Isn't that cool?Just three easy steps and you have your brand new single node hadoop cluster in your lap. But, there are some pre-requisites and it's very important to satisfy them.

CPU : 64-bit

OS : Red Hat, CentOS, SUSE, or Ubuntu

Memory : 4 GB minimum, more in production

Disk : Raw, unformatted drives and partitions

DNS : Hostname, reaches all other nodes

Users : Common users across all nodes; Keyless ssh

Java : Must run Java

Other : NTP, Syslog, PAM

The above procedure will install following services on your machine :

CLDB : mapr-cldb

JobTracker : mapr-jobtracker

MapR Control Server : mapr-webserver

MapR Data Platform : mapr-fileserver

Metrics : mapr-metrics

NFS : mapr-nfs

TaskTracker : mapr-tasktracker

ZooKeeper : mapr-zookeeper

In order to install other hadoop projects and for further documentation you can visit the official documentation here.

I hope you found this post helpful, and as always comments and suggestions are welcome.

How to work with Avro data using Apache Spark(Spark SQL API)

We all know how cool Spark is when it comes to fast, general-purpose cluster computing. Apart from the core APIs Spark also provides a rich ...