Search code examples
hadoopcommand-linecloudera

How do I get the success/failure status of a Hadoop job from the command line?


I'm using CDH4 with MRv1. From what I can tell, there is no command line tool for checking the "status" of a completed job. When I go to the web console job detail page, I can see "Status: Failed" or "Status: Succeeded". If I run mapred job -list all or mapred job -status job_201309231203_0011, neither indicate "Failed" or "Succeeded".

Am I missing some other command?


Solution

  • The fist couple lines of output from hadoop job -list all are:

    X jobs submitted
    States are:
            Running : 1     Succeded : 2    Failed : 3      Prep : 4
    JobId   State   StartTime       UserName        Priority        SchedulingInfo
    

    And the lines of output look like:

    job_201309171413_38136  1       1382455374980   somebody        NORMAL  0 running map tasks using 0 map slots. 0 additional slots reserved. 1 running reduce tasks using 1 reduce slots. 0 additional slots reserved.
    job_201309171413_37222  2       1382430339635   somebody        NORMAL  0 running map tasks using 0 map slots. 0 additional slots reserved. 0 running reduce tasks using 0 reduce slots. 0 additional slots reserved.
    

    That second column is the State of the job. Based on the header lines, 1 means Running and 2 means Succeeded. It's not the clearest format: 4 lines of headers, needing to reference the header to figure out what the state codes actually mean, and no way of getting the state of just one job.

    The easiest way to parse this output for a specific job is:

    $ job_id=job_201309171413_38136
    $ hadoop job -list all | awk -v job_id=${job_id} 'BEGIN{OFS="\t"; FS="\t"; final_state="Unknown"} $0 == "States are:" {getline; for(i=1;i<=NF;i++) { split($i,s," "); states[s[3]] = s[1] }} $1==job_id { final_state=states[$2]; exit} END{print final_state}'
    Running
    
    $ job_id=job_201309171413_37222
    $ hadoop job -list all | awk -v job_id=${job_id} 'BEGIN{OFS="\t"; FS="\t"; final_state="Unknown"} $0 == "States are:" {getline; for(i=1;i<=NF;i++) { split($i,s," "); states[s[3]] = s[1] }} $1==job_id { final_state=states[$2]; exit} END{print final_state}'
    Succeeded
    
    $ job_id=foobar
    $ hadoop job -list all | awk -v job_id=${job_id} 'BEGIN{OFS="\t"; FS="\t"; final_state="Unknown"} $0 == "States are:" {getline; for(i=1;i<=NF;i++) { split($i,s," "); states[s[3]] = s[1] }} $1==job_id { final_state=states[$2]; exit} END{print final_state}'
    Unknown