I have installed CDH 5.5.1
with Hue
, Hadoop
, Spark
, Hive
, Oozie
, Yarn
and ZooKeeper
.
When I run a Spark
job or MapReduce
job, Hue
displays a issue in the job history. The problem is that when I restart the CDH
services (Not the physical nodes), it removes all the job histories that were before the restart.
On Hadoop there are several files that I suspect have information about the task and might be the ones that hold the job information. Their hadoop paths are:
/tmp/logs/user/logs/
/user/history/done/2016/
I have looked for it in the Cloudera Manager
configuration page, Hue configuration page and some configuration files with no success. I don't know how to prevent this removal. Am I missing something?
If you really just need to see job history on a Hadoop cluster, the YARN History Server should have a history of all YARN jobs run on the cluster.
Hue has a JIRA ticket for the issue you describe, titled "Job browser should talk to the YARN history server to display old jobs": https://issues.cloudera.org/browse/HUE-2558. Basically, Hue needs to talk to the YARN History Server (not just the Resource Manager) to get the information you're looking for.
The good news is that the task appears to have been completed and included with the release of Hue 4.0, which occurred on 5/11/2017. The bad news is that Cloudera has not yet done a release with that version of Hue rolled in.