Posts

Showing posts from April, 2013

Hadoop Herd : When to use What...

Image
8 years ago not even Doug Cutting would have thought that the tool which he's naming after the name of his kid's soft toy would so soon become a rage and change the way people and organizations look at their data. Today Hadoop and BigData have almost become synonyms to each other. But Hadoop is not just Hadoop now. Over the time it has evolved into one big herd of various tools, each meant to serve a different purpose. But glued together they give you a powerpacked combo. Having said that, one must be careful while choosing these tools for their specific use case as one size doesn't fit all. What is working for someone might not be that productive for you. So, here I am trying to show you which tool should be picked in which scenario. It's not a big comparative study but a short intro to some very useful tools. And, I am really not an expert or an authority so there is always some scope of suggestions. Please feel free to comment or suggest if you have any. I wou

Hadoop+Ubuntu : The Big Fat Wedding.

Now, here is a treat for all you Hadoop and Ubuntu lovers. Last month, Canonical , the organization behind the Ubuntu operating system, partnered with MapR , one of the Hadoop heavyweights, in an effort to make Hadoop available as an integrated part of Ubuntu through its repositories. The partnership announced that MapR's M3 Edition for Apache Hadoop will be packaged and made available for download as an integrated part of the Ubuntu operating system. Canonical and MapR are also working to develop a Juju Charm that can be used by OpenStack and other customers to easily deploy MapR into their environments. The free MapR M3 Edition includes HBase, Pig, Hive, Mahout, Cascading, Sqoop, Flume and other Hadoop-related components for unlimited production use. MapR M3 will be bundled with Ubuntu 12.04 LTS and 12.10 via the Ubuntu Partner Archive. MapR also announced that the source code for the component packages of the MapR Distribution for Apache Hadoop is now publicly available on Gi

Is your data really Big(Data)??

The advent of so many noticeable tools and technologies for handling BigData problems has made the lives of a lot of people and organizations easier. A lot of these are open source, they have good support, good community and are pretty active. But there is another aspect of it. When things become easy, free, with good support and in abundance,  we often start to over-utilize them. Having said that, I would like to share one incident. We organize Hadoop meetups here in Bangalore(India). In one of the initial meetings we just decided to exchange views with each other on how we are using Hadoop, and other related projects. There I noticed that a lot of folks were either using or planning to use Hadoop for problems which could easily be solved using traditional systems. In fact they could be solved in a much better and efficient way. There was absolutely no need to use Hadoop for these kind of problems. So, it raised question in my mind. The question was, are we really getting the '

Happy Birthday Hadoop

Although I am a bit late, it is still worth wishing the most significant 'Computer Science Thing' I have know since I got my computer science senses. You might find me biased towards Hadoop, but I am actually helpless, when it comes to Hadoop. I started my career as a Hadoop developer so i'll always have that 'first love' kinda feeling for Hadoop. Back in 2004, not even Dough Cutting would have thought that Hadoop will so quickly grow into one of the most powerful computing platforms, when he had started to work on a platform for distributed storage and processing, after getting inspired by those 2 great papers from Google on GFS(Google File System) and MapReduce, which he later on named 'Hadoop' after his kid's toy elephant. And here we are today. It was mid 2006 when I had heard about Hadoop for the first time at an Open Source Conference, held here in Bangalore(India). But I never knew at that time this is that piece of technology that is going to