Before starting Pig in map reduce mode you always have to start the history server else while trying to execute Pig Latin statements the below mentioned logs are generated:
2018-10-18 15:59:13,709 [main] INFO
org.apache.hadoop.mapred.ClientServiceDelegate - Application state
is completed. FinalApplicationStatus=SUCCEEDED. **Redirecting to job
history server**
2018-10-18 15:59:14,713 [main] INFO org.apache.hadoop.ipc.Client -
Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0
time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
MILLISECONDS)
As shown in the above logs Pig Execution engine is trying to connect with the history server Please explain what is the role of job history server in Hadoop and why a connection needs to be made with the history server in Pig for a Map Reduce job
JobTracker or ResourceManager keeps all job information in memory. For finished jobs, it drops them to avoid running out of memory. Tracking of these past jobs are delegated to JobHistory server.
Pig clients pulls job counter stats when its jobs are finished. Stats could still be with JobTracker/ResourceManager or pig may need to ask the JobHistory server. When JobHistory server is down, it prints out those log messages but eventually client should still succeed with missing stats.