Search code examples
hiveapache-spark-sqlhadoop2

Impala OR hive with SPARK as execution engine?


I want to design Web UI which fetches data from HDFS. I want to generate some reports using this data which is stored in HDFS. I have my own custom reports format. I am writing REST API's to fetch data. But running HIVE queries gives latency issues Hence I want different approach for this, I could think of two.

  1. Using IMPALA to create tables. But I am not sure about REST support for IMPALA.

  2. Using HIVE but instead of MR use SPARK as execution engine. .

  3. spark-job-server provides REST support, and fetch data with SPARK-SQL.

Which of the approach will be suitable or is there any better approach for this? Please can anyone help as I am very new in this.


Solution

  • I'd prefer to choose impala if latency is the main consideration. It's dedicated to SQL processing on hdfs and does it well. About REST api and the application logic you are achieving, this seems to be a good example