Search code examples
hadoophivehue

Is there a configuration setting for Hive on Hue (CDH 5.9.3) that limits the number of containers that can be used?


This is a general problem in our group where our Hive queries frequently scale up to consume most of the available YARN executors and Memory on our CDH cluster. While the underlying problem is in the number of partitions in our tables and the complexity of our joins, we aren't free to rebuild those tables. We can control resource consumption in Spark by configuring spark.dynamicAllocation.maxExecutors and spark.executor.memory. Is there something similar we can use on Hue so that Hue will 'play well' with other jobs on the cluster?


Solution

  • Yes, you can better manage the amount of Hadoop cluster compute resources being used from your Hue-launched Hive queries.

    In order to do so, you will want to first configure YARN scheduler queues; for Cloudera's CDH distribution, these are called Dynamic Resource Pools

    You can learn more about this topic in the CDH Documentation

    Once you have configured a pool intended for your Hue-launched, semi-interactive, Hive queries, you can instruct Hive to this pool for an individual query by passing through the resource pool name for the value of the mapred.job.queue.name key.

    Let's pretend our queue name is called interactive.hive_queue. We would prepend this SET statement before our HiveQL query statement:

    SET mapred.job.queue.name=interactive.hive_queue;
    

    You may need to update your Hue configuration hue.ini to allow this configuration value to be passed through by your Hue users

    Reference: HiveQL Language Manual

    You should also be able to create a saved Hive Configuration for Hue to always use this YARN queue for your Hue-launched Hive queries.

    Reference: hiveserver2.py

    (this assumes you are using the mapreduce (mr2) execution engine for your Hive queries)

    If you wanted to change the queue for all of your Hive queries, you could do this by changing the Hive Server2 configuration hive-site.xml. This change would look like:

    <property>
       <name>mapreduce.job.queuename</name>
       <value>interactive.hive_queue</value>
    </property>