Search code examples
logginghadoop-yarn

How can I access the first attempt's yarn log?


If I use _attemptid postfix am I getting the given attempt's log? Like this:

yarn logs -applicationId application_11112222333333_444444_1

Strangely I didn't find an answer for this on the web.

UPDATE: Let me rephrase my question: How can I access a given attempt's yarn log?


Solution

  • Here's a bit ugly but working solution in several steps (for hadoop-2.6). Basically each attempt executes in it's container. To get logs for specific container need to know applicationId, containerId, and node manager address. For example you need to get logs for appattempt_1:

    1. To get info about appattempt (containerId, host url): yarn applicationattempt -list application_ID_1. You'll got something like this:
    ======================== ======== ==================== =========================== 
      ApplicationAttempt-Id    State    AM-Container-Id            Tracking-URL         
     ======================== ======== ==================== =========================== 
      appattempt_1             FAILED   container_1          https://host1:8090/blabla  
      appattempt_2             KILLED   container_2          https://host2:8090/blabla  
     ======================== ======== ==================== =========================== 
    
    1. To convert tracking-URL to node address: $ yarn node -list -all | grep host1 | awk '{print $1}' host1:8041

    2. yarn logs -applicationId application_ID_1 -containerId container_1 -nodeAddress host1:8041

    In hadoop-2.7 you can just use:

    yarn logs -applicationId  [OPTIONS]
    
    general options are:
     -am                      Prints the AM Container logs for
                                             this application. Specify
                                             comma-separated value to get logs
                                             for related AM Container. For
                                             example, If we specify -am 1,2,
                                             we will get the logs for the
                                             first AM Container as well as the
                                             second AM Container. To get logs
                                             for all AM Containers, use -am
                                             ALL. To get logs for the latest
                                             AM Container, use -am -1. By
                                             default, it will print all
                                             available logs. Work with
                                             -log_files to get only specific
                                             logs.