Search code examples
hadoophivehadoop-yarnapache-tez

How to run a query hive and get the applicationID via log


I'm writing a shell script that executes a hive command, writing the log and output information to two separate files:

hive -S -f pdr_extrator.sql 2> pdr_extrator_log.txt | sed 's / [\ t] / | / g' 1> pdr_extrator_out.txt

The log file, at the end of the execution, is as follows:

log4j: WARN No such property [maxBackupIndex] in org.apache.log4j.DailyRollingFileAppender. log4j: WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar: file: /usr/hdp/2.2.6.0-2800/hadoop/lib/slf4j-log4j12-1.7.5.jar! /Org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar: file: /usr/hdp/2.2.6.0-2800/hive/lib/hive-jdbc-0.14.0.2.2.6.0-2800-standalone.jar! / Org / slf4j / impl / StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

When I run via the command line, it is possible to obtain the applicationID of my specific query, as shown below:

ApplicationID - Hive command line

I wonder if there is any way to get the applicationID via log.

Today I am using the command yarn application -list -appTypes TEZ and monitoring the process that appears near the start of my query, to later use the command yarn application -status application_XXXXX to monitor only my execution.

The problem is that this method is flawed, since another process may enter the queue at a similar time, for example.

Your help is appreciated.


Solution

  • You are running hive query file with -S option which is suppressing logging related to yarn application id.

    Try to run

    hive -f pdr_extrator.sql
    

    You must be able to see logs like below on console or file if redirected.

    Status: Running (Executing on YARN cluster with App id application_1579987899994_341626)