Search code examples

Oozie job expiring on Java action when writing to HDFS

I have an Oozie coordinator that runs a workflow every hour. The workflow is composed of two sequential actions: a shell action and a Java action. When I run the coordinator, the shell action seems to execute successfully, however, when it's time for the Java action, the Job Browser in Hue always show:

There was a problem communicating with the server: Job application_<java-action-id> has expired.

When I click on the application_id, here's the snapshot: Oozie java action fail

This seems to point on and When I looked into server logs:

[23/Nov/2015 02:25:22 -0800] middleware   INFO     Processing exception: Job application_1448245438537_0010 has expired.: Traceback (most recent call last):
  File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/core/handlers/", line 112, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/db/", line 371, in inner
    return func(*args, **kwargs)
  File "/usr/lib/hue/apps/jobbrowser/src/jobbrowser/", line 67, in decorate
    raise PopupException(_('Job %s has expired.') % jobid, detail=_('Cannot be found on the History Server.'))
PopupException: Job application_1448245438537_0010 has expired.

The Java action consists of two parts: REST API call and writing to HDFS (via Hadoop client library) the parsed result. Eventhough the Java action job is expiring / failing on Job Browser, the write to HDFS was successful. Here's the snippet of the HDFS writing part of the Java code.

FileSystem hdfs = FileSystem.get(new URI(hdfsUriPath), conf);
OutputStream os = hdfs.create(file);
BufferedWriter br = new BufferedWriter(new OutputStreamWriter(os, "UTF-8"));

When I run the workflow as a standalone, I've got a 50-50 chance of success and expiration on the Java action part, but on coordinator, all Java action's are expiring.

The YARN logs shows this:

 Job commit failed: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(
at org.apache.hadoop.hdfs.DFSClient.create(
at org.apache.hadoop.hdfs.DFSClient.create(
at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(
at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(
at org.apache.hadoop.hdfs.DistributedFileSystem.create(
at org.apache.hadoop.hdfs.DistributedFileSystem.create(
at org.apache.hadoop.fs.FileSystem.create(
at org.apache.hadoop.fs.FileSystem.create(
at org.apache.hadoop.fs.FileSystem.create(
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$

So it looks like it has problem with closing the FileSystem at the ending of my Java code (should I keep the FileSystem open?).

I'm using Cloudera Quickstart CDH 5.4.0 and Oozie 4.1.0


  • The problem is already solved. My Java action uses an instance (say variable fs) of org.apache.hadoop.fs.FileSystem class. At the end of the Java action, I use fs.close(), which will cause the problem on the next period of Oozie job. So when I removed this line, everything went well again.