Posts

Showing posts from 2014

Fun with HBase shell

Image
HBase shell is great, specially while getting yourself familiar with HBase. It provides lots of useful shell commands using which you can perform trivial tasks like creating tables, putting some test data into it, scanning the whole table, fetching data from a specific row etc etc. Executing  help  on HBase shell will give you the list of all the HBase shell commands. If you need help on a specific command, type help "command" . For example, help "get" will give you a detailed explanation of the get command. But this post is not about the above said stuff. We will try to do something fun here. Something which is available, but less known. So, get ready, start your HBase daemons, open HBase shell and get your hands dirty. For those of us who are unaware, HBase shell is based on  JRuby , the Java Virtual Machine-based implementation of Ruby. More specifically, it uses the  Interactive Ruby Shell (IRB) , which is used to enter Ruby commands and get an immedia

List of my top 10 most voted SO answers

Here is a list of my top 10 most voted answers on Stackoverflow . All these questions are related to cloud computing including discussions on distributed storage and computing tools like Hadoop, HBase etc. I hope you find it useful as others did. What is SAAS , PAAS and IAAS ? With examples When to use HBase and when to use Hive Fast Hadoop Analytics (Cloudera Impala vs Spark/Shark vs Apache Drill) Comparing Cassandra's CQL vs Spark/Shark queries vs Hive/Hadoop (DSE version) Difference between HBase and Hadoop How does impala provide faster query response compared to hive How can I develop an ASP.NET web application using Hadoop as Database? How (in Hadoop),is the data put into map and reduce functions in correct types? Why do we need Hadoop passwordless ssh? PIG VS HIVE VS Native Map Reduce

Analyzing your data on the fly with Pig through Mortar Watchtower

Let me start by thanking Mortar for developing such an amazing tool.  Isn't it really cool to have the ability to make your Pig development faster without having to write a complete script, run it and then wait for for local or remote Pig to finish the execution and finally give you the final data? Quite often, when writing a Pig script, I find it very time consuming to debug what each line of my script is doing. Moreover, the fact that Pig is a dataflow language makes it even more important to have a clear idea of what exactly your data looks like at each step of the flow. This obviously helps in writing compact and efficient scripts. Trust me, you don't want to write inefficient code while dealing with Petabytes of data. It's a bitter truth that Hadoop development iterations are slow. Traditional programmers have always had the benefit of re-compiling their app, running it, and seeing the results within seconds. They have near instant validation that what they’re build