Cloudera Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS or HBase. In addition to using the same unified storage platform, Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver and user interface (Hue Beeswax) as Apache Hive. This provides a familiar and unified platform for batch-oriented or real-time queries. Impala is an addition to tools available for querying big data. Impala does not replace the batch processing frameworks built on MapReduce such as Hive. Hive and other frameworks built on MapReduce are best suited for long running batch jobs, such as those involving batch processing of Extract, Transform, and Load (ETL) type jobs.
Cloudera Impala Diagram
The Impala solution is composed of the following components :
1. Impala State Store - The state store coordinates information about all instances of impalad running in your environment. This information is used to find data so the distributed resources can be used to respond to queries.
2.impalad - This process runs on datanodes and responds to queries from the Impala shell. impalad receives requests from the database connector layer and schedules the tasks for optimal execution. Intermittently, the impalad updates the Impala State Store of its name and address.
More about Impala can be found out at the Cloudera Imapala page.
This comment has been removed by the author.
ReplyDelete