Monday, February 4, 2013

HOW TO BENCHMARK HBASE USING YCSB

YCSB (Yahoo Cloud Serving Benchmark) is a popular tool for evaluating the performance of different key-value and cloud serving stores. You can use it to test the read/write performance of your Hbase cluster and trust me it's very effective. In this post i'll show you how to build and use YCSB for your particular version of Hbase. So, this is just about setting up and using YCSB and not about YCSB itself. For detailed info on YCSB you can go to the below specified links :

1- Github-YCSB page : https://github.com/brianfrankcooper/YCSB
2- The paper from ACM Symposium on Cloud Computing, "Benchmarking Cloud Serving Systems with YCSB" : http://research.yahoo.com/files/ycsb.pdf

So, let us get started...

Step1- Clone the YCSB git repository :

apache@hadoop:~$ git clone http://github.com/brianfrankcooper/YCSB.git

This will create a directory caleed YCSB inside your current directory. (It might take some time depending on your internet connection speed. So, be patient)

Step2- Go inside this newly created YCSB directory and move inside the hbase directory. You will find an xml file here named as pom.xml. Open this pom.xml file and edit it so that it looks like this :

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <parent>
    <groupId>com.yahoo.ycsb</groupId>
    <artifactId>root</artifactId>
    <version>0.1.4</version>
  </parent>
  <artifactId>hbase-binding</artifactId>
  <name>HBase DB Binding</name>
  <dependencies>
    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase</artifactId>
      <!--<version>${hbase.version}</version>-->
      <version>0.94.4</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-core</artifactId>
      <!--<version>1.0.0</version>-->
      <version>1.0.4</version>
    </dependency>
    <dependency>
      <groupId>com.yahoo.ycsb</groupId>
      <artifactId>core</artifactId>
      <version>${project.version}</version>
    </dependency>
  </dependencies>
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-assembly-plugin</artifactId>
        <version>${maven.assembly.version}</version>
        <configuration>
          <descriptorRefs>
            <descriptorRef>jar-with-dependencies</descriptorRef>
          </descriptorRefs>
          <appendAssemblyId>false</appendAssemblyId>
        </configuration>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>single</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
</project>

      Pay attention to the lines in red. These are the changes that you have to make in order to build YCSB without any problem for your specific version of Hbase.

NOTE : As of this writing I am usign hadoop-1.04 and hbase-0.94.4, so I have mentioned these versions in the above shown file. You have to specify the versions which you are going to use.

Step3- Now, go back to your terminal and move inside the YCSB directory :
apache@hadoop:~$ cd YCSB

Step4- It's time to do the build now :
apache@hadoop: /YCSB/ mvn clean package
This will start the build process. You can see all the information as the build process continues. If everything goes fine then you will see something like this on your terminal :


NOTE: If multiple descriptors or descriptor-formats are provided for this project, the value of this file will be non-deterministic!
[WARNING] Replacing pre-existing project main-artifact file: /hadoop/projects/YCSB/voldemort/target/archive-tmp/voldemort-binding-0.1.4.jar
with assembly file: /hadoop/projects/YCSB/voldemort/target/voldemort-binding-0.1.4.jar
[INFO]                                                                      
[INFO] ------------------------------------------------------------------------
[INFO] Building YCSB Release Distribution Builder 0.1.4
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.3:clean (default-clean) @ ycsb ---
[INFO]
[INFO] --- maven-checkstyle-plugin:2.6:checkstyle (validate) @ ycsb ---
[INFO]
[INFO] --- maven-assembly-plugin:2.2.1:single (default) @ ycsb ---
[INFO] Reading assembly descriptor: src/main/assembly/distribution.xml
[INFO] Processing sources for module project: com.yahoo.ycsb:core:jar:0.1.4
[INFO] Processing sources for module project: com.yahoo.ycsb:cassandra-binding:jar:0.1.4
[INFO] Processing sources for module project: com.yahoo.ycsb:hbase-binding:jar:0.1.4
[INFO] Processing sources for module project: com.yahoo.ycsb:hypertable-binding:jar:0.1.4
[INFO] Processing sources for module project: com.yahoo.ycsb:dynamodb-binding:jar:0.1.4
[INFO] Processing sources for module project: com.yahoo.ycsb:elasticsearch-binding:jar:0.1.4
[INFO] Processing sources for module project: com.yahoo.ycsb:infinispan-binding:jar:0.1.4
[INFO] Processing sources for module project: com.yahoo.ycsb:jdbc-binding:jar:0.1.4
[INFO] Processing sources for module project: com.yahoo.ycsb:mapkeeper-binding:jar:0.1.4
[INFO] Processing sources for module project: com.yahoo.ycsb:mongodb-binding:jar:0.1.4
[INFO] Processing sources for module project: com.yahoo.ycsb:orientdb-binding:jar:0.1.4
[INFO] Processing sources for module project: com.yahoo.ycsb:redis-binding:jar:0.1.4
[INFO] Processing sources for module project: com.yahoo.ycsb:voldemort-binding:jar:0.1.4
[INFO] Processing sources for module project: com.yahoo.ycsb:ycsb:pom:0.1.4
[INFO] Building tar : /hadoop/projects/YCSB/distribution/target/ycsb-0.1.4.tar.gz
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] YCSB Root ......................................... SUCCESS [1.940s]
[INFO] Core YCSB ......................................... SUCCESS [23.149s]
[INFO] Cassandra DB Binding .............................. SUCCESS [7.421s]
[INFO] HBase DB Binding .................................. SUCCESS [15.638s]
[INFO] Hypertable DB Binding ............................. SUCCESS [2.805s]
[INFO] DynamoDB DB Binding ............................... SUCCESS [3.451s]
[INFO] ElasticSearch Binding ............................. SUCCESS [8.123s]
[INFO] Infinispan DB Binding ............................. SUCCESS [2:27.468s]
[INFO] JDBC DB Binding ................................... SUCCESS [18.235s]
[INFO] Mapkeeper DB Binding .............................. SUCCESS [10.011s]
[INFO] Mongo DB Binding .................................. SUCCESS [4.874s]
[INFO] OrientDB Binding .................................. SUCCESS [19.702s]
[INFO] Redis DB Binding .................................. SUCCESS [3.960s]
[INFO] Voldemort DB Binding .............................. SUCCESS [14.181s]
[INFO] YCSB Release Distribution Builder ................. SUCCESS [7.076s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 4:48.305s
[INFO] Finished at: Mon Feb 04 01:13:00 IST 2013
[INFO] Final Memory: 107M/737M
[INFO] ------------------------------------------------------------------------

This shows that the build has been completed successfully and you are all set to go. 

Step5- Step4 will create a directory named target inside your /YCSB/distribution/ directory. You will find the YCSB tar file here, ycsb-0.1.4.tar.gz in my case. Copy this file to some location of your choice and extract it. This will give you the ycsb-1.0.4 directory which contains all the important and necessary stuff.

Step6- Move inside the ycsb-1.0.4 directory where you will find a directory called /hbase-binding. Go inside the /hbase-binding and open the /lib directory situated there. Copy the following jars from your /HBASE_HOME/lib into this /lib directory :
     1-slf4j-api-*.jar
     2-slf4j-log4j12-*.jar
     3-zookeeper-*.jar

Step7- You will find another directory named /conf inside /hbase-binding. You will find an xml file here named as hbase-site.xml file. Replace this hbase-site.xml file with the habse-site.xml present in your /HBASE_HOME/conf directory.

Step8- You are all set for testing your Hbase now. Start the Hadoop and Hbase processes and go inside ycsb-1.0.4. Now, issue the following command to load test your Hbase deployment :
apache@hadoop:/ycsb-0.1.4$ bin/ycsb load hbase -P workloads/workloada -p columnfamily=f1 -p recordcount=1000000 -p threadcount=4 -s | tee -a workloada.dat

This will start the load test and after sometime it will give you the result summary. Do not get overwhelmed by the great amount of information displayed on your terminal after this operation. For our convenience we have piped this ycsb command with the Linux tee command and written the entire output information to the terminal and the workloada.dat. You will find this file inside your ycsb-0.1.4
directory which contains the same content as your terminal has. You can extract useful insights from this file(or from your terminal) like :
The overall runtime in milliseconds
Throughput i.e. operations per second
Number of operations
AverageLatency etc etc

Here are some of the lines from my terminal :
[OVERALL], RunTime(ms), 73258.0
[OVERALL], Throughput(ops/sec), 13650.386305932458
[UPDATE], Operations, 4
[UPDATE], AverageLatency(us), 530564.25
[UPDATE], MinLatency(us), 65895
[UPDATE], MaxLatency(us), 1642179

I hope you found this post helpful. Stay connected for more :)

10 comments:

  1. it is a great instructin and with all the detail stepbystep. I started to setup YCSB yesterday, and was looking for an instruction like this. And have to say this is my luck day to find this one which was just posted one day earlier. Thank you!

    ReplyDelete
    Replies
    1. you are always welcome..i would like to hear from you whether it really worked for you or not.

      Delete
  2. may I ask a dump question here? get the following exception. the HBase is standalone with 1 servers. Many thanks

    I ran the step 8 with a little modification(recordcount = 1000, and threadcount=4), but get an error, here is the log:
    -----------------------------
    YCSB Client 0.1
    Command line: -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloada -p columnfamily=f1 -p recordcount=1000 -p threadcount=1 -s -load
    com.yahoo.ycsb.DBException: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1000 actions: DoNotRetryIOException: 1000 times, servers with issues: localhost:60020,
    at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:111)
    at com.yahoo.ycsb.DBWrapper.cleanup(DBWrapper.java:73)
    at com.yahoo.ycsb.ClientThread.run(Client.java:307)
    Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1000 actions: DoNotRetryIOException: 1000 times, servers with issues: localhost:60020,
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1591)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1367)
    at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:945)
    at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:106)
    ... 2 more
    [OVERALL], RunTime(ms), 878.0
    [OVERALL], Throughput(ops/sec), 1138.9521640091116
    [INSERT], Operations, 1000
    [INSERT], AverageLatency(us), 652.439
    [INSERT], MinLatency(us), 51
    [INSERT], MaxLatency(us), 489779
    [INSERT], 95thPercentileLatency(ms), 0
    [INSERT], 99thPercentileLatency(ms), 0
    [INSERT], Return=0, 1000
    [INSERT], 0, 995
    [INSERT], 1, 1
    [INSERT], 2, 1
    [INSERT], 3, 1
    [INSERT], 4, 1
    [INSERT], 5, 0
    ....
    ....
    ....
    [INSERT], 997, 0
    [INSERT], 998, 0
    [INSERT], 999, 0
    [INSERT], >1000, 0
    --------------------------------------------

    ReplyDelete
    Replies
    1. add this property to your hbase-site.xml file and restart your hbase :

      zookeeper.session.timeout
      1800000
      Session Time out.


      and see if it works for you.

      Delete
    2. Mohammad,

      Many thanks. I did the change, but doesn't work. pretty much the same output.
      I dig a bit more. seems something wrong with my hbase setting. I put 'http://localhost:60020/' in the browser, and get the following
      ----------------------
      use ���� ���|ÿÿÿÿ���)org.apache.hadoop.ipc.RPC$VersionMismatch���>Server IPC version 3 cannot communicate with client version 47
      ----------------------
      btw, localhost:60010 looks fine as the browser will show the master cluster

      Delete
    3. here is the hbase_site.xml setting for the port:

      hbase.master.port
      60000


      hbase.master.info.port
      60010


      hbase.regionserver.port
      60020


      hbase.regionserver.info.port
      60030

      Delete
    4. could you please show me your hbase-site.xml. dontariq@gmail.com is my email address.

      Delete
  3. hello, there. It is great! But I just wondered, can you also write such instructions for voldemort? I mean 'How to configure YCSB to beat voldemort'.

    thanks!

    ReplyDelete
  4. We need to create the usertable also before executing the ycsb command.

    ReplyDelete

How to work with Avro data using Apache Spark(Spark SQL API)

We all know how cool Spark is when it comes to fast, general-purpose cluster computing. Apart from the core APIs Spark also provides a rich ...