Search code examples
hadoophivemapr

Hadoop, hive -> get list of sql being run against the cluster


So we have a group of people hitting our cluster and would like to monitor every SQL statement being run via hive/odbc. The job history server web page will give me part of the SQL but not everything. Is there a way to retrieve the full SQL of commands being run? Could be via web GUI or command line. I want to monitor for inefficient queries and send out warnings to developers.

Cluster is 10 node MapR cluster running Mapr 5.0 in yarn mode


Solution

  • I found the solution. SSH into the machine running hiveserver2 and execute the command below. Replace with the user you are interested in (or * if you want them all) and replace hive-0.13 with the version of hive you are using.

    cat /opt/mapr/hive/hive-0.13/logs/<user>/hive.log | grep "Starting command"

    EDIT: in the year since I wrote this is appears it has changed to:

    cat /opt/mapr/hive/hive-2.1/logs/<user>/hive.log | grep "Executing command"

    EDIT Take 2. The above doesn't deal with multi-line sql and shows only the first line. For many of the queries it was simply showing "SELECT" and nothing more. The below solves the issue. I had a nice sed solution but it didn't work with tail -f for some reason, so Perl it is. Unusual for Perl but it's actually more readable than the sed solution.

    cat /opt/mapr/hive/hive-2.3/logs/<user>/hive.log | perl -ne 's/\r\n/ /g; print;' | grep "Executing command"