Wednesday, July 25, 2012

HOW TO RUN MAPREDUCE PROGRAMS USING ECLIPSE

Hadoop provides us a plugin for Eclipse that helps us to connect our Hadoop cluster to Eclipse. We can then run MapReduce jobs and browse Hdfs, through the Eclipse itself. But it requires a few things to be done in order to achieve that. Normally, it is said that we just have to copy hadoop-eclipse-plugin-*.jar to the eclipse/plugins directory in order to get things going. But unfortunately it did not work for me. When I tried to connect eclipse to my Hadoop cluster it threw this error :


An internal error occurred during: "Map/Reduce location status updater".
org/codehaus/jackson/map/JsonMappingException

You may face some different error, but it would be somewhat similar to this. This is because of the fact that some required jars are missing from the plugin that comes with Hadoop. Then, I tried a few things and it turned out to be positive.



So, I thought of sharing it, so that if anybody else is facing the same issue, can try it out. Just try the steps outlined below and let me know if it works for you.

First of all setup a Hadoop cluster properly on your machine. If you need some help on that just go here. Then download eclipse compatible with your environment from eclipse home. Also set your HADOOP_HOME to point to your hadoop folder. 

Now, follow these steps :

1- Go to your HADOOP_HOME/contrib folder. Copy the hadoop-eclipse-plugin-*.jar somewhere and extract it. This will give a folder named hadoop-eclipse-plugin-*

2- Now, add following 5 jars from your HADOOP_HOME/lib folder to the hadoop-eclipse-plugin-*/lib folder, you have got just now after extracting the plugin :
    commons-configuration-1.6.jar
    commons-httpclient-3.0.1.jar
    commons-lang-2.4.jar
    jackson-core-asl-1.0.1.jar
    jackson-mapper-asl-1.0.1.jar


3- Now, modify the hadoop-eclipse-plugin-*/META-INF/MANIFEST.MF file and change the Bundle-ClassPath to :
Bundle-ClassPath: classes /, lib / hadoop-core.jar, lib/commons-cli-1.2.jar, lib/commons-httpclient-3.0.1.jar, lib/jackson-core-asl-1.0.1.jar , lib/jackson-mapper-asl-1.0.1.jar, lib/commons-configuration-1.6.jar, lib/commons-lang-2.4.jar

4- Now, re 'jar' the package and place this new jar inside eclipse/plugin directory and restart the eclipse. 

You are good to go now. Do let me know it it doesn't work for you.

NOTE : For details you can visit the official home page.

 Thank you.

EDIT : If you are not able to see the job status at the JobTracker port(50070) you might find this post of mine useful.

8 comments:

  1. Hi
    I have this error.
    An internal error occurred during: "Connecting to DFS hadoop-master01".
    org/apache/commons/configuration/Configuration

    Please advise.

    ReplyDelete
  2. hi i am using hadoop-1.0.1 on windows machine with cygwin...and i didnt found any hadoop-eclipse-plugin-*.jar in my hadoop installation i found an folder in my hadoop installation location like hadoop-1.0.1\src\contrib\eclipse-plugin but there was no jar found any where for the eclipse...what to do now?

    ReplyDelete
  3. Hello DeviKiran, I had tested it long back using 0.20.203.0, and the jar was present in that distribution. I don't know the reason, but they have removed it from 1.x release. Try to build it using and creating the jar.

    ReplyDelete
    Replies
    1. hey i did build the jar and added those jackson jars u told to lib and rejared it... but its not getting detected in eclipse. What could be the reason? any specific way of jaring up?

      Delete
    2. Hi Rohit, nothing special..just a usual jar..make sure you are meeting all the dependencies..perhaps some dependency is missing..give it a few more shots, if it still doesn't work just let me know..i'll send you the working jar, in case you need it.

      Delete
  4. I'm trying to get this to work in 1.1.1. Build the plugin, unjar'd and added the specified jar files, but still can't get it to work. Returning the same error.

    ReplyDelete
  5. Hi Tariq ... I have one issue regarding running hadoop program through eclipse. When I run Program through eclipse the web interface of hadoop doesnt show any progress neither there is any entry in log but it actully runs as the output can be seen in eclipse console. when I run the same program through terminal I dont find this problem. I didnt found any appropriate answer to this problm on google.....

    ReplyDelete
  6. Hello ma'am, this is happening because eclipse is actually running all the daemons on its own instead of actually submitting the job to the JT. adding these 2 line in the code should do the trick :
    conf.set("fs.default.name", "YOUR_NN_HOSTNAME:9000");
    conf.set("mapred.job.tracker", "YOUR_JT_HOSTNAME:9001");

    ReplyDelete

How to work with Avro data using Apache Spark(Spark SQL API)

We all know how cool Spark is when it comes to fast, general-purpose cluster computing. Apart from the core APIs Spark also provides a rich ...