Thursday, May 24, 2012

Tips for Hadoop newbies (Part I).

Few moths ago, after completing my graduation I thought of doing something new. In quest of that I started learning and working on Apache's platform for distributed computing, the Apache Hadoop. Like a good student I started with reading the documentation. Trust me there are many good posts and documentations available for learning Hadoop and setting up a Hadoop cluster. But even after following everything properly, at times I ran into few problems and I could not find solutions for them. I posted questions on the mailing lists, searched over the internet, asked the experts and finally got my issues resolved. But it took a lot of precious time and efforts. Hence I decided to write down those things, so that if anyone who is just starting off doesn't have to face all those things.

Please provide me with your valuable comments and suggestions if you have any. That will help me a lot in refining things further, and to add on to my knowledge, as I am still a learner.

1 - If there is some problem with the Namenode, first of all check your hosts file. Proper DNS resolution is very important for Hadoop cluster to work properly.Then see whether ssh is working fine or not. For a pseudo-distributed mode configuration your hosts file should look this -
     127.0.0.1 localhost
     127.0.0.1 ubuntu.ubuntu-domain ubuntu
     For fully-distributed mode configuration add the IP addresses and the hostnames accordingly.

How to work with Avro data using Apache Spark(Spark SQL API)

We all know how cool Spark is when it comes to fast, general-purpose cluster computing. Apart from the core APIs Spark also provides a rich ...