Thursday, June 14, 2012
HOW TO CHANGE THE DEFAULT KEY-VALUE SEPARATOR OF A MAPREDUCE JOB
The default MapReduce output format, TextOutputFormat, writes records as lines of text. Its keys and values may be of any type, since TextOutputFormat turns them to strings by calling toString() on them.
Each key-value pair is separated by a tab character. We can change this separator to some character of our choice using the mapreduce.output.textoutputformat.separator (In the older MapReduce API this was mapred.textoutputformat.separator).
To do this you have to add this line in your driver function -
Configuration.set("mapreduce.output.key.field.separator", ",");
Subscribe to:
Post Comments (Atom)
How to work with Avro data using Apache Spark(Spark SQL API)
We all know how cool Spark is when it comes to fast, general-purpose cluster computing. Apart from the core APIs Spark also provides a rich ...
-
Hive is a wonderful tool for those who like to perform batch operations to process their large amounts of data residing on a Hadoop cluster ...
-
HBase shell is great, specially while getting yourself familiar with HBase. It provides lots of useful shell commands using which you ca...
-
SSH (Secure Shell) is a network protocol secure data communication, remote shell services or command execution and other secure network ser...
No comments:
Post a Comment