Thursday, June 21, 2012

HOW TO CONFIGURE HBASE IN PSEUDO DISTRIBUTED MODE ON A SINGLE LINUX BOX

If you have successfully configured Hadoop on a single machine in pseudo-distributed mode and looking for some help to use Hbase on top of that then you may find this writeup useful. Please let me know if you face any issue.

Since you are able to use Hadoop, I am assuming you have all the pieces in place . So we'll directly start with Habse configuration. Please follow the steps shown below to do that:

1 - Download the Hbase release from one of the mirrors using the link shown below. Then unzip it at some convenient location (I'll call this location as HBASE_HOME now on) -  
http://apache.techartifact.com/mirror/hbase/

2 - Go to the /conf directory inside the unzipped HBASE_HOME and do these changes :

     - In the hbase-env.sh file modify these line as shown :
       export JAVA_HOME=/usr/lib/jvm/java-6-sun
       export HBASE_REGIONSERVERS
       =/PATH_TO_YOUR_HBASE_FOLDER/conf/regionservers
       export HBASE_MANAGES_ZK=true


     - In the hbase-site.xml add these properties :
       <property>
           <name>hbase.rootdir</name>
           <value>SAME VALUE AS YOUR fs.default.name IN core-site.xml/hbase</value>
       </property>
       <property>
  <name>hbase.cluster.distributed</name>
          <value>true</value>
       </property>
       <property>
           <name>hbase.zookeeper.quorum</name>
           <value>localhost</value>
       </property>
      <property>
           <name>dfs.replication</name>
           <value>1</value>
      </property>
      <property>
           <name>hbase.zookeeper.property.clientPort</name>
           <value>2181</value>
      </property>
      <property>
           <name>hbase.zookeeper.property.dataDir</name>
           <value>/home/mohammad/hbase/zookeeper</value>
       </property>

3 - Now copy the hadoop-core-*.jar from your HADOOP_HOME and commons-collections-3.2.1.jar from HADOOP_HOME/lib folder into your HBASE_HOME/lib folder.

4 - Change the line 127.0.1.1 in your /etc/hosts file to 127.0.0.1. Since this file protected, we need to open it with root privileges. Use the command shown below to that :
mohammad@ubuntu:~$ sudo gedit /etc/hosts
This will open the /etc/hosts file with root privileges. Modify it, and save it.

5Now go to the terminal and change the directory to your HBASE_HOME and issue this command to start the Hbase processes (your Hadoop processes must be running in advance though) :

mohammad@ubuntu:~/hbase-0.90.4$ bin/start-hbase.sh 


If everything was fine then you will see something like this on your terminal :

localhost: starting zookeeper, logging to /home/mohammad/hbase/logs/hbase-mohammad-zookeeper-ubuntu.out

starting master, logging to /home/mohammad/hbase/logs/hbase-mohammad-master-ubuntu.out

localhost: starting regionserver, logging to /home/mohammad/hbase/logs/hbase-mohammad-regionserver-ubuntu.out

mohammad@ubuntu:~/hbase-0.90.4$ 



NOTE : To verify whether your Hbase processes are running or not do this on your terminal :



mohammad@ubuntu:~/hbase-0.90.4$ jps



This will list down all the process running currently. If your Hbase is running fine you will see something like this :

mohammad@ubuntu:~/hbase-0.90.4$ jps
4585 SecondaryNameNode
4055 NameNode
5674 HRegionServer
5439 HMaster
5388 HQuorumPeer
4939 TaskTracker
4326 DataNode
5852 Jps
4686 JobTracker



Hbase also provides a WEB-UI which you can visit to see if everything is ok. Point your web browser to this link :




This will show you the Hbase master's web page.



6 - Now go to the Hbase shell and get yourself familiar with a few Hbase commands. To do this issue the follow command :

mohammad@ubuntu:~/hbase-0.90.4$ bin/hbase shell

This command will take you to the Hbase shell. You should be able to see this on your terminal screen by now :

mohammad@ubuntu:~/hbase-0.90.4$ bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.90.4, r1150278, Sun Jul 24 15:53:29 PDT 2011

hbase(main):001:0>

EXAMPLE COMMANDS :

1 - List the existing tables :

hbase(main):001:0> list

2 - To create a new table called table1 having a column family cf :

hbase(main):001:0> create 'table1', 'cf'
0 row(s) in 1.9500 seconds

hbase(main):002:0> list
TABLE                                                                                                                                                  
demo                                                                                                                                                   
table1                                                                                                                                                 
test3                                                                                                                                                  
3 row(s) in 0.0310 seconds

hbase(main):003:0>

** For a complete know how of Hbase please visit the Hbase home page at - http://hbase.apache.org/


7 comments:

  1. really a nice post..helped me a lot in configuring hbase on my machine

    ReplyDelete
  2. Hbase is the Hadoop database. Think of it as a distributed, scalable, big data store.

    ReplyDelete
  3. Tariq, you haven't mentioned command to list all tables

    ReplyDelete
  4. and Thanks this is very helpful...:)

    ReplyDelete
  5. You are always welcome Shiv..and thanks a lot for the pointer.

    ReplyDelete
  6. Hi Tariq,
    Thanks a lot :)
    Really, this post is very helpful.

    ReplyDelete
  7. you are always welcome saket :)

    ReplyDelete

How to work with Avro data using Apache Spark(Spark SQL API)

We all know how cool Spark is when it comes to fast, general-purpose cluster computing. Apart from the core APIs Spark also provides a rich ...