Thursday, June 14, 2012

HOW TO CHANGE THE DEFAULT KEY-VALUE SEPARATOR OF A MAPREDUCE JOB


The default MapReduce output format, TextOutputFormat, writes records as lines of text. Its keys and values may be of any type, since TextOutputFormat turns them to strings by calling toString() on them.

 Each key-value pair is separated by a tab character. We can change this separator to some character of our choice using the mapreduce.output.textoutputformat.separator (In the older MapReduce API this was mapred.textoutputformat.separator).

To do this you have to add this line in your driver function -
Configuration.set("mapreduce.output.key.field.separator", ",");

No comments:

Post a Comment

How to work with Avro data using Apache Spark(Spark SQL API)

We all know how cool Spark is when it comes to fast, general-purpose cluster computing. Apart from the core APIs Spark also provides a rich ...