CloudFront: HOW TO MOVE DATA INTO AN HBASE TABLE USING FLUME-NG

Friday, June 15, 2012

HOW TO MOVE DATA INTO AN HBASE TABLE USING FLUME-NG

The first Hbase sink was commited to the Flume 1.2.x trunk few days ago. In this post we'll see how we can use this sink to collect data from a file stored in the local filesystem and dump this data into an Hbase table. We should have Flume built from the trunk in order to achieve that. If you haven't built it yet and looking for some help, you can visit my other post that shows how to build and use Flume-NG at this link :
http://cloudfront.blogspot.in/2012/06/how-to-build-and-use-flume-ng.html

First of all we have to write the configuration file for our agent. This agent will collect data from the file and dump it into the Hbase table. A simple configuration file might look like this :

hbase-agent.sources = tail
hbase-agent.sinks = sink1
hbase-agent.channels = ch1
hbase-agent.sources.tail.type = exec
hbase-agent.sources.tail.command = tail -F /home/mohammad/demo.txt
hbase-agent.sources.tail.channels = ch1
hbase-agent.sinks.sink1.type = org.apache.flume.sink.hbase.HBaseSink
hbase-agent.sinks.sink1.channel = ch1
hbase-agent.sinks.sink1.table = demo
hbase-agent.sinks.sink1.columnFamily = cf
hbase-agent.sinks.sink1.serializer = org.apache.flume.sink.hbase.SimpleHbaseEventSerializer
hbase-agent.sinks.sink1.serializer.payloadColumn = col1
hbase-agent.sinks.sink1.serializer.keyType = timestamp
hbase-agent.sinks.sink1.serializer.rowPrefix = 1
hbase-agent.sinks.sink1.serializer.suffix = timestamp
hbase-agent.channels.ch1.type=memory

Save this in a file called hbase-agent.conf inside the /conf directory of your Flume distribution. Now start your Hadoop and Hbase and create a table called demo with a column family called cf. Now open another terminal and change you directory to /conf inside your FlumeHome. Then start your agent using the below specified command :
$ bin/flume-ng agent -n hbase-agent -c conf/ -f conf/hbase-agent.conf

Now go back to your Hbase shell and scan the demo table. If everything was ok then you will see something like this :

hbase(main):004:0> scan 'demo'
ROW COLUMN+CELL
11339770815331 column=cf:col1, timestamp=1339770818340, value=value1
11339770815332 column=cf:col1, timestamp=1339770818342, value=value6
2 row(s) in 0.0500 seconds

NOTE : I have taken a small text file called demo.txt here which has following few lines in it

value1
value2
value3
value4
value5
value6

3 comments:

IltnOctober 11, 2012 at 7:04 PM
Hi - thx for the post, is there a way to fill two or more HBase columns at once?
Cheers.
ReplyDelete
Replies
TechzoneFebruary 18, 2013 at 1:16 PM
Hi -thanks for the post, it is working fine but i have a dot that how it is going to store data from dump to hbase table...?

Techzone
ReplyDelete
Replies
TariqFebruary 18, 2013 at 2:05 PM
you are welcome. could you please tell me which dump are you talking about?is it the RDBMS dump?
ReplyDelete
Replies

Add comment

CloudFront

Friday, June 15, 2012

HOW TO MOVE DATA INTO AN HBASE TABLE USING FLUME-NG

3 comments:

How to work with Avro data using Apache Spark(Spark SQL API)

About Me