CloudFront: HOW TO RUN MAPREDUCE PROGRAMS USING ECLIPSE

Wednesday, July 25, 2012

HOW TO RUN MAPREDUCE PROGRAMS USING ECLIPSE

Hadoop provides us a plugin for Eclipse that helps us to connect our Hadoop cluster to Eclipse. We can then run MapReduce jobs and browse Hdfs, through the Eclipse itself. But it requires a few things to be done in order to achieve that. Normally, it is said that we just have to copy hadoop-eclipse-plugin-*.jar to the eclipse/plugins directory in order to get things going. But unfortunately it did not work for me. When I tried to connect eclipse to my Hadoop cluster it threw this error :

An internal error occurred during: "Map/Reduce location status updater".
org/codehaus/jackson/map/JsonMappingException

You may face some different error, but it would be somewhat similar to this. This is because of the fact that some required jars are missing from the plugin that comes with Hadoop. Then, I tried a few things and it turned out to be positive.

So, I thought of sharing it, so that if anybody else is facing the same issue, can try it out. Just try the steps outlined below and let me know if it works for you.

First of all setup a Hadoop cluster properly on your machine. If you need some help on that just go here. Then download eclipse compatible with your environment from eclipse home. Also set your HADOOP_HOME to point to your hadoop folder.

Now, follow these steps :

1- Go to your HADOOP_HOME/contrib folder. Copy the hadoop-eclipse-plugin-*.jar somewhere and extract it. This will give a folder named hadoop-eclipse-plugin-*

2- Now, add following 5 jars from your HADOOP_HOME/lib folder to the hadoop-eclipse-plugin-*/lib folder, you have got just now after extracting the plugin :
commons-configuration-1.6.jar
commons-httpclient-3.0.1.jar
commons-lang-2.4.jar
jackson-core-asl-1.0.1.jar
jackson-mapper-asl-1.0.1.jar

3- Now, modify the hadoop-eclipse-plugin-*/META-INF/MANIFEST.MF file and change the Bundle-ClassPath to :

Bundle-ClassPath: classes /, lib / hadoop-core.jar, lib/commons-cli-1.2.jar, lib/commons-httpclient-3.0.1.jar, lib/jackson-core-asl-1.0.1.jar , lib/jackson-mapper-asl-1.0.1.jar, lib/commons-configuration-1.6.jar, lib/commons-lang-2.4.jar

4- Now, re 'jar' the package and place this new jar inside eclipse/plugin directory and restart the eclipse.

You are good to go now. Do let me know it it doesn't work for you.

NOTE : For details you can visit the official home page.

Thank you.

EDIT : If you are not able to see the job status at the JobTracker port(50070) you might find this post of mine useful.

8 comments:

me.busybodyAugust 19, 2012 at 2:25 PM
Hi
I have this error.
An internal error occurred during: "Connecting to DFS hadoop-master01".
org/apache/commons/configuration/Configuration

Please advise.
ReplyDelete
Replies
UnknownAugust 28, 2012 at 10:47 AM
hi i am using hadoop-1.0.1 on windows machine with cygwin...and i didnt found any hadoop-eclipse-plugin-*.jar in my hadoop installation i found an folder in my hadoop installation location like hadoop-1.0.1\src\contrib\eclipse-plugin but there was no jar found any where for the eclipse...what to do now?
ReplyDelete
Replies
TariqAugust 28, 2012 at 6:49 PM
Hello DeviKiran, I had tested it long back using 0.20.203.0, and the jar was present in that distribution. I don't know the reason, but they have removed it from 1.x release. Try to build it using and creating the jar.
ReplyDelete
Replies
AnonymousDecember 4, 2012 at 11:32 PM
I'm trying to get this to work in 1.1.1. Build the plugin, unjar'd and added the specified jar files, but still can't get it to work. Returning the same error.
ReplyDelete
Replies
priyaMarch 21, 2013 at 11:29 AM
Hi Tariq ... I have one issue regarding running hadoop program through eclipse. When I run Program through eclipse the web interface of hadoop doesnt show any progress neither there is any entry in log but it actully runs as the output can be seen in eclipse console. when I run the same program through terminal I dont find this problem. I didnt found any appropriate answer to this problm on google.....
ReplyDelete
Replies
TariqMarch 21, 2013 at 4:07 PM
Hello ma'am, this is happening because eclipse is actually running all the daemons on its own instead of actually submitting the job to the JT. adding these 2 line in the code should do the trick :
conf.set("fs.default.name", "YOUR_NN_HOSTNAME:9000");
conf.set("mapred.job.tracker", "YOUR_JT_HOSTNAME:9001");
ReplyDelete
Replies

Add comment

CloudFront

Wednesday, July 25, 2012

HOW TO RUN MAPREDUCE PROGRAMS USING ECLIPSE

8 comments:

How to work with Avro data using Apache Spark(Spark SQL API)

About Me