The first Hbase sink was commited to the Flume 1.2.x trunk few days ago. In this post we'll see how we can use this sink to collect data from a file stored in the local filesystem and dump this data into an Hbase table. We should have Flume built from the trunk in order to achieve that. If you haven't built it yet and looking for some help, you can visit my other post that shows how to build and use Flume-NG at this link :
http://cloudfront.blogspot.in/2012/06/how-to-build-and-use-flume-ng.html
First of all we have to write the configuration file for our agent. This agent will collect data from the file and dump it into the Hbase table. A simple configuration file might look like this :
hbase-agent.sources = tail
hbase-agent.sinks = sink1
hbase-agent.channels = ch1
hbase-agent.sources.tail.type = exec
hbase-agent.sources.tail.command = tail -F /home/mohammad/demo.txt
hbase-agent.sources.tail.channels = ch1
hbase-agent.sinks.sink1.type = org.apache.flume.sink.hbase.HBaseSink
hbase-agent.sinks.sink1.channel = ch1
hbase-agent.sinks.sink1.table = demo
hbase-agent.sinks.sink1.columnFamily = cf
hbase-agent.sinks.sink1.serializer = org.apache.flume.sink.hbase.SimpleHbaseEventSerializer
hbase-agent.sinks.sink1.serializer.payloadColumn = col1
hbase-agent.sinks.sink1.serializer.keyType = timestamp
hbase-agent.sinks.sink1.serializer.rowPrefix = 1
hbase-agent.sinks.sink1.serializer.suffix = timestamp
hbase-agent.channels.ch1.type=memory
Save this in a file called hbase-agent.conf inside the /conf directory of your Flume distribution. Now start your Hadoop and Hbase and create a table called demo with a column family called cf. Now open another terminal and change you directory to /conf inside your FlumeHome. Then start your agent using the below specified command :
$ bin/flume-ng agent -n hbase-agent -c conf/ -f conf/hbase-agent.conf
Now go back to your Hbase shell and scan the demo table. If everything was ok then you will see something like this :
hbase(main):004:0> scan 'demo'
ROW COLUMN+CELL
11339770815331 column=cf:col1, timestamp=1339770818340, value=value1
11339770815332 column=cf:col1, timestamp=1339770818342, value=value6
2 row(s) in 0.0500 seconds
NOTE : I have taken a small text file called demo.txt here which has following few lines in it
value1
value2
value3
value4
value5
value6
Subscribe to:
Post Comments (Atom)
How to work with Avro data using Apache Spark(Spark SQL API)
We all know how cool Spark is when it comes to fast, general-purpose cluster computing. Apart from the core APIs Spark also provides a rich ...
-
Hive is a wonderful tool for those who like to perform batch operations to process their large amounts of data residing on a Hadoop cluster ...
-
HBase shell is great, specially while getting yourself familiar with HBase. It provides lots of useful shell commands using which you ca...
-
SSH (Secure Shell) is a network protocol secure data communication, remote shell services or command execution and other secure network ser...
Hi - thx for the post, is there a way to fill two or more HBase columns at once?
ReplyDeleteCheers.
Hi -thanks for the post, it is working fine but i have a dot that how it is going to store data from dump to hbase table...?
ReplyDeleteTechzone
you are welcome. could you please tell me which dump are you talking about?is it the RDBMS dump?
ReplyDelete