I noticed that when there is an error in running a PIG script, a log is generated and kept. But when there is no error, the log file is removed. Is there a way to keep the log file even when the job is successful?
By default errors (e.g: script parsing errors) are logged to pig.logfile
which can be set in $PIG_HOME/conf/pig.properties
. If you want to log status messages too, then prepare a valid log4j.properties
file and set it in the log4jconf
property.
E.g: rename log4j.properties.template to log4j.properties in $PIG_HOME/conf and set the followings:
log4j.logger.org.apache.pig=info, B
# ***** A is set to be a ConsoleAppender.
#log4j.appender.A=org.apache.log4j.ConsoleAppender
# ***** A uses PatternLayout.
#log4j.appender.A.layout=org.apache.log4j.PatternLayout
#log4j.appender.A.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
# ***** B is set to be a FileAppender.
log4j.appender.B=org.apache.log4j.FileAppender
#log4j.appender.B.File=/home/user/pig-distrib/logs/pig_success.log
log4j.appender.B.File=/home/user/pig-distrib/logs/pig.log
log4j.appender.B.layout=org.apache.log4j.PatternLayout
log4j.appender.B.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
log4j.appender.B.Append=true
When using Pig v0.10.0 (r1328203) I found that a successful pig task doesn't write the job's history logs to the output directory on hdfs.
(hadoop.job.history.user.location=${mapred.output.dir}/_logs/history/
)
If you want to have these histories by all means then set mapred.output.dir in your pig script in this way:
set mapred.output.dir '/user/hadoop/test/output';