Search code examples
hadoopapache-pig

How to keep PIG job log file when it is successful.


I noticed that when there is an error in running a PIG script, a log is generated and kept. But when there is no error, the log file is removed. Is there a way to keep the log file even when the job is successful?


Solution

  • By default errors (e.g: script parsing errors) are logged to pig.logfile which can be set in $PIG_HOME/conf/pig.properties. If you want to log status messages too, then prepare a valid log4j.properties file and set it in the log4jconf property.

    E.g: rename log4j.properties.template to log4j.properties in $PIG_HOME/conf and set the followings:

    log4j.logger.org.apache.pig=info, B
    
    # ***** A is set to be a ConsoleAppender.
    #log4j.appender.A=org.apache.log4j.ConsoleAppender
    # ***** A uses PatternLayout.
    #log4j.appender.A.layout=org.apache.log4j.PatternLayout
    #log4j.appender.A.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
    
    # ***** B is set to be a FileAppender.
    log4j.appender.B=org.apache.log4j.FileAppender
    #log4j.appender.B.File=/home/user/pig-distrib/logs/pig_success.log
    log4j.appender.B.File=/home/user/pig-distrib/logs/pig.log
    log4j.appender.B.layout=org.apache.log4j.PatternLayout
    log4j.appender.B.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
    log4j.appender.B.Append=true
    


    When using Pig v0.10.0 (r1328203) I found that a successful pig task doesn't write the job's history logs to the output directory on hdfs.
    (hadoop.job.history.user.location=${mapred.output.dir}/_logs/history/)

    If you want to have these histories by all means then set mapred.output.dir in your pig script in this way:

    set mapred.output.dir '/user/hadoop/test/output';