Wednesday, December 19, 2012

India to Open Up Its Data

Here comes a moment to be proud of. India has joined a select group of over 20 countries whose governments have launched open data portals. With a view to improving government transparency and efficiency,, (which is in beta right now) will provide access to a valuable repository of datasets, from government departments, ministries, and agencies, and autonomous bodies.

Data Portal India is a platform for supporting Open Data initiative of Government of India. The portal is intended to be used by Ministries/Department/Organizations of Government of India to publish datasets, and applications for public use. It intends to increase transparency in the functioning of Government and also opens avenues for many more innovative uses of Government Data to give different perspective.

The entire product is available for download at the Open Source Code Sharing Platform GitHub.

Open data will be made up of “non-personally identifiable data” collected, compiled, or produced during the normal course of governing. It will be released under an unrestricted license -- meaning it is freely available for everyone to use, reuse, or distribute, but citations will be required.

For a detailed info and all the terms and conditions, you can visit the official web site.

Exception in thread "main" java.lang.NoSuchFieldError: type at org.apache.hadoop.hive.ql.parse.HiveLexer.mKW_CREATE(

Now you have successfully configured Hadoop and everything is running perfectly fine. So, you decided to give Hive a try. But, soon as you try to create the very first table you find yourself into something like this :

Exception in thread "main" java.lang.NoSuchFieldError: type
        at org.apache.hadoop.hive.ql.parse.HiveLexer.mKW_CREATE(
        at org.apache.hadoop.hive.ql.parse.HiveLexer.mTokens(
        at org.antlr.runtime.Lexer.nextToken(
        at org.antlr.runtime.BufferedTokenStream.fetch(
        at org.antlr.runtime.BufferedTokenStream.sync(
        at org.antlr.runtime.CommonTokenStream.setup(
        at org.antlr.runtime.CommonTokenStream.LT(
        at org.apache.hadoop.hive.ql.parse.HiveParser.statement(
        at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(
        at org.apache.hadoop.hive.ql.Driver.compile(
        at org.apache.hadoop.hive.ql.Driver.compile(
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(
        at org.apache.hadoop.hive.cli.CliDriver.processLine(
        at org.apache.hadoop.hive.cli.CliDriver.main(
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
        at java.lang.reflect.Method.invoke(
        at org.apache.hadoop.util.RunJar.main(

Need not worry. It's something related to antlr-*.jar which is present inside you HIVE_HOME/lib directory. Just make sure you don't have any other antlr-*.jar in your classpath. If it still doesn't work, download the latest version from the ANTLR website and put it inside your HIVE_HOME/lib. Restart Hive and you are good to go...

NOTE: If you want to see how to configure Hadoop you can go here

How to work with Avro data using Apache Spark(Spark SQL API)

We all know how cool Spark is when it comes to fast, general-purpose cluster computing. Apart from the core APIs Spark also provides a rich ...