Posts

Showing posts from 2013

How to run Hive queries through Hive Web Interface.

Image
One of the good things about Hadoop, and related projects, which I really like is the WebUI provided to us. It makes our life a lot easier. Just point your web browser to the appropriate URL and quickly perform the desired action. Be it browsing through HDFS files or glancing over HBase tables. Otherwise you need to go the shell and issue the associated commands one by one for each action [I know i'm a bit lazy ;)]. Hive is no exception and provides us a WebUI, called as Hive Web Interface , or HWI in short. But, somehow I feel it is less documented and talked about as compared to HDFS and HBase WebUI. But that doesn't make it any less useful. In fact I personally find it quite helpful. With its help you can do various operations like browsing your DB schema , see your sessions , query your tables etc. You can also see the System and User variables like Java Runtime, your OS architecture, your PATH etc etc. OK, enough brand building. Let's get started and see how to

Visualizing Pig Queries Through Lipstick

Quite often while working with Pig you would have reached a situation wherein you found that your Pig scripts have reached such a level of complexity that the flow of execution, and it’s relation to the MapReduce jobs being executed, has become difficult to visualize. And this eventually ends up with the need of additional efforts required to develop, maintain, debug, and monitor the execution of scripts. But not anymore. Thankfully Netflix has developed a tool that enables developers to visualize and monitor the execution of their data flows at a logical level, and they call it Lipstick . As an plementation of PigProgressNotificationListener , Lipstick piggybacks on top of all Pig scripts executed in our environment notifying a Lipstick server of job executions and periodically reporting progress as the script executes. Lipstick has got some really cool features. For instance once you are at the Lipstick main page you can see all the Pig jobs that are currently running or have

How to install MapR M3 on Ubuntu through Ubuntu Partner Archive.

In a recent post of mine I had mentioned about the partnership between MapR and Canonical towards an initiative to make Hadoop available with Ubuntu natively through  Ubuntu Partner Archive . Since, the package has been released now, I thought of showing how to get it done. Trust me it's really cool to install Hadoop by just one apt-get install :) First things first. Open your  sources.list  file and add the MapR repositories into it. deb http://package.mapr.com/releases/v2.1.2/ubuntu/ mapr optional deb http://package.mapr.com/releases/ecosystem/ubuntu binary/ Now, update your repository. sudo apt-get update Note : If it throws any error regarding MapR repositories, just uncomment the lines which allow us to add software from Canonical's partner repository . ## Uncomment the following two lines to add software from #Canonical's ## 'partner' repository. ## This software is not part of Ubuntu, but is offered by #Canonical and the ## respective vendors

Hadoop Herd : When to use What...

Image
8 years ago not even Doug Cutting would have thought that the tool which he's naming after the name of his kid's soft toy would so soon become a rage and change the way people and organizations look at their data. Today Hadoop and BigData have almost become synonyms to each other. But Hadoop is not just Hadoop now. Over the time it has evolved into one big herd of various tools, each meant to serve a different purpose. But glued together they give you a powerpacked combo. Having said that, one must be careful while choosing these tools for their specific use case as one size doesn't fit all. What is working for someone might not be that productive for you. So, here I am trying to show you which tool should be picked in which scenario. It's not a big comparative study but a short intro to some very useful tools. And, I am really not an expert or an authority so there is always some scope of suggestions. Please feel free to comment or suggest if you have any. I wou

Hadoop+Ubuntu : The Big Fat Wedding.

Now, here is a treat for all you Hadoop and Ubuntu lovers. Last month, Canonical , the organization behind the Ubuntu operating system, partnered with MapR , one of the Hadoop heavyweights, in an effort to make Hadoop available as an integrated part of Ubuntu through its repositories. The partnership announced that MapR's M3 Edition for Apache Hadoop will be packaged and made available for download as an integrated part of the Ubuntu operating system. Canonical and MapR are also working to develop a Juju Charm that can be used by OpenStack and other customers to easily deploy MapR into their environments. The free MapR M3 Edition includes HBase, Pig, Hive, Mahout, Cascading, Sqoop, Flume and other Hadoop-related components for unlimited production use. MapR M3 will be bundled with Ubuntu 12.04 LTS and 12.10 via the Ubuntu Partner Archive. MapR also announced that the source code for the component packages of the MapR Distribution for Apache Hadoop is now publicly available on Gi

Is your data really Big(Data)??

The advent of so many noticeable tools and technologies for handling BigData problems has made the lives of a lot of people and organizations easier. A lot of these are open source, they have good support, good community and are pretty active. But there is another aspect of it. When things become easy, free, with good support and in abundance,  we often start to over-utilize them. Having said that, I would like to share one incident. We organize Hadoop meetups here in Bangalore(India). In one of the initial meetings we just decided to exchange views with each other on how we are using Hadoop, and other related projects. There I noticed that a lot of folks were either using or planning to use Hadoop for problems which could easily be solved using traditional systems. In fact they could be solved in a much better and efficient way. There was absolutely no need to use Hadoop for these kind of problems. So, it raised question in my mind. The question was, are we really getting the '

Happy Birthday Hadoop

Although I am a bit late, it is still worth wishing the most significant 'Computer Science Thing' I have know since I got my computer science senses. You might find me biased towards Hadoop, but I am actually helpless, when it comes to Hadoop. I started my career as a Hadoop developer so i'll always have that 'first love' kinda feeling for Hadoop. Back in 2004, not even Dough Cutting would have thought that Hadoop will so quickly grow into one of the most powerful computing platforms, when he had started to work on a platform for distributed storage and processing, after getting inspired by those 2 great papers from Google on GFS(Google File System) and MapReduce, which he later on named 'Hadoop' after his kid's toy elephant. And here we are today. It was mid 2006 when I had heard about Hadoop for the first time at an Open Source Conference, held here in Bangalore(India). But I never knew at that time this is that piece of technology that is going to

MapReduce jobs running through Eclipse don't appear in the JobTracker Web UI at 50030

Hello all,       In response to an earlier post of mine, that shows how to run a MapReduce job through Eclipse IDE, I quite frequently receive comments that the users are not able to see the status of their MapReduce job, which they are currently running, on the JobTracker Web UI. The trick is very simple. Just add the following 2 lines in your code where you are doing all the configuration. Something like this : Configuration conf = new Configuration(); conf.set("fs.default.name", "hdfs://localhost:9000"); conf.set("mapred.job.tracker", "localhost:9001"); This should do the trick for you. After doing this just point your web browser to the JobTracker Web UI at localhost:50030. **Modify the hostname and port address as per your configuration. To know about Hadoop configuration and setup you can go to this link . It shows the entire process in detail. HTH

Unable To Connect Your Phone In VirtualBox Through USB Cable??

Recently I came to know about the Premium Suite For Samsung Galaxy Note . And being a proud owner of this great device it was quite obvious that I wanted to take the pleasure of this. So, I thought of upgrading my phone through Samsung Kies . But I have been working of Linux since last couple of years, so got kinda stuck as Kies doesn't come for Linux. So, I installed Oracle VirtualBox on my Ubuntu Box and installed Windows 8 in it. After that I quickly installed Kies in it. But, to my surprise I was not able to connect my phone. After some diagnosis I found the error message shown below : Failed to access the USB subsystem. VirtualBox is not currently allowed to access USB devices. You can change this by adding your user to the 'vboxusers' group. Please see the user manual for a more detailed explanation. And the detail message was : NS_ERROR_FAILURE (0x00004005) Component:  Host Interface:  IHost {dab4a2b8-c735-4f08-94fc-9bec84182e2f} Callee:  IMachine {5eaa

HOW TO BENCHMARK HBASE USING YCSB

YCSB   (Yahoo Cloud Serving Benchmark) is a popular tool for evaluating the performance of different key-value and cloud serving stores. You can use it to test the read/write performance of your Hbase cluster and trust me it's very effective. In this post i'll show you how to build and use YCSB for your particular version of Hbase. So, this is just about setting up and using YCSB and not about YCSB itself. For detailed info on YCSB you can go to the below specified links : 1-  Github-YCSB page :  https://github.com/brianfrankcooper/YCSB 2-  The paper from ACM Symposium on Cloud Computing, "Benchmarking Cloud Serving Systems with YCSB" :  http://research.yahoo.com/files/ycsb.pdf So, let us get started... Step1-  Clone the YCSB git repository : apache@hadoop:~$ git clone http://github.com/brianfrankcooper/YCSB.git This will create a directory caleed YCSB inside your current directory. (It might take some time depending on your internet connection speed. So

How to install Java 6 on Ubuntu

In one of the previous post of mine I have shown you how to install Sun(Oracle) Java on Ubuntu through its repository. This will, by default, install Java-7 on your machine as Ubuntu 12.04(and onwards) have Java-7 in their repository. But, sometimes you may come across a situation wherein you need some specific version of Java. For example, it is advisable to use Java-6 while trying to configure or use Apache Hadoop . In such a scenario you need to download the appropriate version of Java and install it manually. It is again a straightforward process. Just follow the steps below : Note : Java-6 has been taken here, as an example, on a machine running Ubuntu 12.10 Step 1 : Download the required version of Java from the official download page . It will download   jdk-6u38-linux-x64.bin inside your Downloads directory. Step 2 : Go to the directory where jdk was downloaded(Download here) and make it an executable file using this command : apache@hadoop:~/Downloads$ sudo chmod +x j

Salesforce.com's Phoenix : SQL layer for your Hbase

Image
Ever wished to have the ability to write SQL queries for your data stored in Hbase ?I know your answer is gonna be  Hive . But I am talking about something which doesn't incur heavy start-up costs and which is based on native HBase APIs rather than going through the MapRreduce framework. Need not worry. Salesforce.com comes to the rescue this time. Salesforce.com has recently announced Phoenix, an SQL layer over HBase. What do I meant by that??? Phoenix is an SQL layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Phoenix takes our SQL query, compiles them into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. Cool..Isn't it? Phoenix doesn't depend on MapReduce, but that doesn't mean that it doesn't believe in the philosophy of  bringing computation closer to data . It very well does that, through :        Coprocessors :  To perform operations on t

Audi's Autonomous Piloted Parking : Real Life Transformers???

Image
Ever wished to have a car which would make you feel yourself like 007. NO??? Need not worry. Soon you will be able to do that (hopefully ;) ). How?? Keep on reading... At the  Consumer Electronics Show (CES) which was held in Las Vegas few days ago Audi showed something which would definitely blow your mind. CES is considered as world’s most important electronics show. There, Audi had shown several new technological aspects and one of them is through this self-driven Audi A7 showing off the brand’s piloted parking system . It reminds of James Bond movie Tomorrow Never Dies where Bond would operate his BMW 750iL with a remote control. Here we have Annie Lien (Senior Engineer, Electronics Lab at Volkswagen Group of America), who shows us a cool demo of this awesome feature. She directs an Audi A7 using her phone and it parks itself at the Mandarin Oriental Hotel Las Vegas and on leaving hotel; she directs it off from the parking to her. They call it as Park Assist . And this what Aud

Google Spanner : The Future Of NoSQL

Image
Quite often, while working with Hbase , I used to feel how cool it would be to have a database that can replicate my data to datacenters across the world consistently. So that I can take the pleasure of global availability and geographic locality. And also which will save my data even in case of some catastrophe or natural disaster. Which supports general-purpose transactions, and provides a SQL-based query language. And which has features of an SQL database as well. But it was only untill recently I found out that it is not an imagination anymore. I was sitting with a senior+friend of mine at a Cafe Coffee Day nearby and having a casual chat on BigData stuff. During the discussion he told me about something called as SPANNER . (You might be wondering, why the heck I have emphasized on the word spanner so much. Believe me, you will do the same after reading this post). After that meeting I almost forgot about that incident. Out of the blue, the word spanner flashed back to my mind

This is what 128GB of RAM looks like

Image
Isn't it awesome?????

Premium Suite For Samsung Galaxy Note

If you are envious of your friends or colleagues, who are flaunting  their new Galaxy Note-II having the  awesome Jelly Bean, you don't have to do it anymore. Samsung is there to help you out. They have recently announced the Premium Suite Upgrade for the original Galaxy Note just like they had done for Galaxy S-III some days ago. Although they haven't announced any exact date yet, it is expected to arrive sooner. For the most up to date info you can always visit the official link . This Premium Upgrade includes all of the latest features like Multi-Window etc along with the latest Android version, Jelly Bean. Here is a list of the cool features that are bundled with the upgrade. 1.   Multi-Window :  The Multi-Window feature allows us to do multiple tasks on the same screen simultaneously. It not only gives us a great level of comfort but also looks damn cool. 2. Popup Note / Video / Browser : Popup Note helps in writing down the notes just by pulling out S Pen or double