Thursday, June 27, 2013

Visualizing Pig Queries Through Lipstick


Quite often while working with Pig you would have reached a situation wherein you found that your Pig scripts have reached such a level of complexity that the flow of execution, and it’s relation to the MapReduce jobs being executed, has become difficult to visualize. And this eventually ends up with the need of additional efforts required to develop, maintain, debug, and monitor the execution of scripts.

But not anymore. Thankfully Netflix has developed a tool that enables developers to visualize and monitor the execution of their data flows at a logical level, and they call it Lipstick. As an plementation of PigProgressNotificationListener, Lipstick piggybacks on top of all Pig scripts executed in our environment notifying a Lipstick server of job executions and periodically reporting progress as the script executes.

Lipstick has got some really cool features. For instance once you are at the Lipstick main page you can see all the Pig jobs that are currently running or have run. The following things are displayed for each job:
– User
– Job
– Start Time
– Heartbeat Time (last time a heartbeat was sent)
– Progress
             – Blue (running)
             – Green (complete)
             – Red (failed)
             – Orange (terminated)
  •  Clicking on the header (User, Job, Start Time, etc.) for a column will sort by the column (asc/desc).
  • Search by username or job name.
  • Filter jobs by progress.
  • Pagination controls (next page, show X jobs per page, etc).

Along with this there is a whole bunch of other cool stuff that Lipstick offers. You can find more on Lipstick user guide.

For a detailed overview you can visit their official blog section. And if you can't wait anymore and want to give it a try straight away, you can directly go to their repository.

No comments:

Post a Comment

How to work with Avro data using Apache Spark(Spark SQL API)

We all know how cool Spark is when it comes to fast, general-purpose cluster computing. Apart from the core APIs Spark also provides a rich ...