Friday, June 15, 2012

HOW TO MOVE DATA INTO AN HBASE TABLE USING FLUME-NG

The first Hbase sink was commited to the Flume 1.2.x trunk few days ago. In this post we'll see how we can use this sink to collect data from a file stored in the local filesystem and dump this data into an Hbase tableWe should have Flume built from the trunk in order to achieve that. If you haven't built it yet and looking for some help, you can visit my other post that shows how to build and use Flume-NG at this link :
http://cloudfront.blogspot.in/2012/06/how-to-build-and-use-flume-ng.html

First of all we have to write the configuration file for our agent. This agent will collect data from the file and dump it into the Hbase table. A simple configuration file might look like this :


hbase-agent.sources = tail
hbase-agent.sinks = sink1
hbase-agent.channels = ch1
hbase-agent.sources.tail.type = exec
hbase-agent.sources.tail.command = tail -F /home/mohammad/demo.txt
hbase-agent.sources.tail.channels = ch1
hbase-agent.sinks.sink1.type = org.apache.flume.sink.hbase.HBaseSink
hbase-agent.sinks.sink1.channel = ch1
hbase-agent.sinks.sink1.table = demo
hbase-agent.sinks.sink1.columnFamily = cf
hbase-agent.sinks.sink1.serializer = org.apache.flume.sink.hbase.SimpleHbaseEventSerializer
hbase-agent.sinks.sink1.serializer.payloadColumn = col1
hbase-agent.sinks.sink1.serializer.keyType = timestamp
hbase-agent.sinks.sink1.serializer.rowPrefix = 1
hbase-agent.sinks.sink1.serializer.suffix = timestamp
hbase-agent.channels.ch1.type=memory

Save this in a file called hbase-agent.conf inside the /conf directory of your Flume distribution. Now start your Hadoop and Hbase and create a table called demo with a column family called cf. Now open another terminal and change you directory to /conf inside your FlumeHome. Then start your agent using the below specified command :
$ bin/flume-ng agent -n hbase-agent -c conf/ -f conf/hbase-agent.conf

Now go back to your Hbase shell and scan the demo table. If everything was ok then you will see something like this :

hbase(main):004:0> scan 'demo'
ROW                                    COLUMN+CELL                                                                                                     
 11339770815331                        column=cf:col1, timestamp=1339770818340, value=value1                                                           
 11339770815332                        column=cf:col1, timestamp=1339770818342, value=value6                                                           
2 row(s) in 0.0500 seconds


NOTE : I have taken a small text file called demo.txt here which has following few lines in it 

value1
value2
value3
value4
value5
value6

3 comments:

  1. Hi - thx for the post, is there a way to fill two or more HBase columns at once?
    Cheers.

    ReplyDelete
  2. Hi -thanks for the post, it is working fine but i have a dot that how it is going to store data from dump to hbase table...?

    Techzone

    ReplyDelete
  3. you are welcome. could you please tell me which dump are you talking about?is it the RDBMS dump?

    ReplyDelete

How to work with Avro data using Apache Spark(Spark SQL API)

We all know how cool Spark is when it comes to fast, general-purpose cluster computing. Apart from the core APIs Spark also provides a rich ...