Thursday, January 31, 2013

Salesforce.com's Phoenix : SQL layer for your Hbase

Ever wished to have the ability to write SQL queries for your data stored in Hbase?I know your answer is gonna be Hive. But I am talking about something which doesn't incur heavy start-up costs and which is based on native HBase APIs rather than going through the MapRreduce framework. Need not worry. Salesforce.com comes to the rescue this time. Salesforce.com has recently announced Phoenix, an SQL layer over HBase. What do I meant by that???

Phoenix is an SQL layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Phoenix takes our SQL query, compiles them into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. Cool..Isn't it?

Phoenix doesn't depend on MapReduce, but that doesn't mean that it doesn't believe in the philosophy of bringing computation closer to data. It very well does that, through :
      Coprocessors : To perform operations on the server-side thus minimizing client/server data transfer
      Custom filters : to prune data as close to the source as possible

And the best part is that there is no adverse effect on the performance.

I am showing a couple a graphs below which present relative performance between Phoenix and some other related products (Courtesy : Phoenix Github page)

Phoenix vs Hive (running over HDFS and HBase)


Phoenix vs Impala (running over HBase)


The performance, as you can see from these graphs is quite good. For a detailed info you can visit this link.

Phoenix stores table metadata in an HBase table and keep it versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows.

Phoenix SQL Support

Phoenix supports all typical SQL query statement clauses, including SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, etc. It also supports a full set of DML commands as well as table creation and versioned incremental alterations through our DDL commands. We try to follow the SQL standards wherever possible. For a complete set of all the things which Phoenix supports you can visit the language reference page.

But, there are certain things which Phoenix doesn't support as of now. They include :
       Joins : Single table only currently.
       Derived tables : Nested queries along with TopN queries are coming soon.
       Relational operators : Union, Intersect, Minus.
       Miscellaneous built-in functions.

I don't feel it's bad considering that Phoenix has just born :)

For an in-depth info about Phoenix, you can visit Phoenix Wiki.

In the next post i'll try to write about building Phoenix with some hands-on. Stay connected till then.

Friday, January 25, 2013

Audi's Autonomous Piloted Parking : Real Life Transformers???

Ever wished to have a car which would make you feel yourself like 007. NO??? Need not worry. Soon you will be able to do that (hopefully ;) ). How?? Keep on reading...

At the Consumer Electronics Show (CES) which was held in Las Vegas few days ago Audi showed something which would definitely blow your mind. CES is considered as world’s most important electronics show. There, Audi had shown several new technological aspects and one of them is through this self-driven Audi A7 showing off the brand’s piloted parking system. It reminds of James Bond movie Tomorrow Never Dies where Bond would operate his BMW 750iL with a remote control. Here we have Annie Lien (Senior Engineer, Electronics Lab at Volkswagen Group of America), who shows us a cool demo of this awesome feature. She directs an Audi A7 using her phone and it parks itself at the Mandarin Oriental Hotel Las Vegas and on leaving hotel; she directs it off from the parking to her. They call it as Park Assist. And this what Audi has to say about its Park Assist technology :

"Audi’s automatic parking systems operate by means of either ultrasound or cameras, which display images via the onboard monitor. One   particularly convenient solution is park assist. When backing into a parking space, it performs all the necessary steering movements; it can handle both parallel parking and parking perpendicular to the street.

The system finds a parking space with ultrasound sensors that scan the roadside in two dimensions while driving at moderate speed. The system notifies the driver via a message in the display once the sensors have found a space which is large enough.

If the driver wishes to park in the space, he or she shifts into reverse and the park assist system takes over the steering. The driver must accelerate, shift gears, and brake. When parallel parking, the detected space is large enough if it is about 80 centimeters (2.62 ft) longer than the vehicle itself. Park assist can perform multi-point parking maneuvers and also offers support in leaving parallel parking spaces.

Another technology from Audi is the parking system plus with surround view cameras. Four small cameras – in the single-frame grille, at the rear and in the side mirror housings – record the vehicle’s immediate surroundings. The driver can call up a variety of views on the large onboard monitor, including a top-down virtual view. On corners or junctions with an obstructed view, the system can analyze cross-traffic otherwise invisible to the driver in front of or behind the vehicle."


To get a feel of that you can watch this video :


Hope you enjoyed this. Stay connected for more.

Thursday, January 17, 2013

Google Spanner : The Future Of NoSQL

Quite often, while working with Hbase, I used to feel how cool it would be to have a database that can replicate my data to datacenters across the world consistently. So that I can take the pleasure of global availability and geographic locality. And also which will save my data even in case of some catastrophe or natural disaster. Which supports general-purpose transactions, and provides a SQL-based query language. And which has features of an SQL database as well. But it was only untill recently I found out that it is not an imagination anymore.

I was sitting with a senior+friend of mine at a Cafe Coffee Day nearby and having a casual chat on BigData stuff. During the discussion he told me about something called as SPANNER.
(You might be wondering, why the heck I have emphasized on the word spanner so much. Believe me, you will do the same after reading this post).

After that meeting I almost forgot about that incident. Out of the blue, the word spanner flashed back to my mind 2 days ago and I started googling about spanner and the search led me to this Google research page, which just blew my mind. Google has already been working extensively on something,which they call as Spanner.

Spanner is a scalable, globally-distributed database designed, built, and deployed at Google. At the highest level of abstraction, it is a database that shards data across many sets of Paxos state machines in datacenters spread all over the world. Replication is used for global availability and geographic locality; clients automatically failover between replicas. Spanner automatically reshards data across machines as the amount of data or the number of servers changes, and it automatically migrates data across machines (even across datacenters) to balance load and in response to failures. Spanner is designed to scale up to millions of machines across hundreds of datacenters and trillions of rows. Applications can use Spanner for high availability,even in the face of wide-area natural disasters, by replicating their data within or even across continents.

We can think of Spanner as globally-distributed database that may spread across the continents covering the planet. Spanner provides several very interesting features :
1 : The replication configurations for data can be controlled dynamically by the applications in a fine grained manner.
2 : It gives us the ability to control which datacenters contain which data.
3 : To control read latency it gives application the ability to decide how far data is from its users etc etc.

But there are 2 things which really stand out : externally consistent reads and writes, and globally consistent reads across the database at a timestamp. Both these things are really difficult to implement in a distributed database. These features enable Spanner to support consistent backups, consistent MapReduce executions, and atomic schema updates, all at global scale, and even in the presence of ongoing transactions.

Few words on the Structure :

A Spanner deployment is called a universe. Spanner is organized as a set of zones, where each zone is somewhat like a Bigtable deployment. Zones can be added to or removed from a running system as new datacenters are brought into service and old ones are turned off. The set of zones is also the set of locations across which data can be replicated. The figure drawn below shows the Spanner server organization :


A zone has one zonemaster and between one hundred and several thousand spanservers. The former assigns data to spanservers; the latter serve data to clients. The per-zone location proxies are used by clients to locate the spanservers assigned to serve their data. The universe master and the placement driver are currently singletons. The universe master is primarily a console that displays status information about all the zones for interactive debugging. The placement driver handles automated movement of data across zones on the timescale of minutes. The placement driver periodically communicates with the spanservers to find data that needs to be moved, either to meet updated replication constraints or to balance load.

For a detailed info you can download the original paper (used as the reference) from here.

I hope you enjoyed reading this post and knowing about Spanner as much as I did. Don't forget to provide me your comments and/or suggestions. Thank you.

Wednesday, January 9, 2013

This is what 128GB of RAM looks like


Isn't it awesome?????



Premium Suite For Samsung Galaxy Note

If you are envious of your friends or colleagues, who are flaunting  their new Galaxy Note-II having the  awesome Jelly Bean, you don't have to do it anymore. Samsung is there to help you out. They have recently announced the Premium Suite Upgrade for the original Galaxy Note just like they had done for Galaxy S-III some days ago.

Although they haven't announced any exact date yet, it is expected to arrive sooner. For the most up to date info you can always visit the official link. This Premium Upgrade includes all of the latest features like Multi-Window etc along with the latest Android version, Jelly Bean. Here is a list of the cool features that are bundled with the upgrade.

1. Multi-Window : The Multi-Window feature allows us to do multiple tasks on the same screen simultaneously. It not only gives us a great level of comfort but also looks damn cool.

2. Popup Note / Video / Browser : Popup Note helps in writing down the notes just by pulling out S Pen or double tap the screen. Whereas Popup Video and Browser allow users to watch videos or surf internet while doing some other tasks on the single screen.

3. Photo Note / Photo Frame : Photo Note feature allow users to write notes on the pictures.

4. Easy Clip : Easy Clip enables to crop an image from any source screen to save or share with adding text. Users can also add select text using Easy Clip by drawing a single line on it.

5. Paper Artist : Paper Artist provides photo editing by adding various built-in editing effects.

6. Handwriting on S Planner/ Email : Users can write notes in their own handwriting in S Planner and even send handwritten notes through Email.

7. Enhanced S Note : Users can add Sketch effect for various photo effect and image filters with Color Picker in the S Pen.

Another cool thing about this upgrade is that it includes the latest Android 4.1(Jelly Bean)  with Project Butter that smoothens the overall performance of the UI and enhances the graphics along with the awesome Google Now. Along with this, rumors are floating around that the new Premium Suite upgrade of Samsung Galaxy Note will also have the Air View. Although I am not 100% sure of it right now, i'll keep you updated as I get any news.

How to work with Avro data using Apache Spark(Spark SQL API)

We all know how cool Spark is when it comes to fast, general-purpose cluster computing. Apart from the core APIs Spark also provides a rich ...