When I look at my logs, I see that my oozie java actions are actually running on multiple machines.
I assume that is because they're wrapped inside m/r job? (is this correct)
Is there a way to have only a single instance of the java action executing on the entire cluster?
The Java action runs inside an Oozie "launcher" job, with just one YARN "map" container.
The trick is that every YARN job requires an application master (AM) container for coordination.
So you end up with 2 containers, _0001
for the AM and _0002
for the Oozie action, probably on different machines.
To control the resource allocation for each one, you can set the following Action properties to override your /etc/hadoop/conf/*-site.xml
config and/or hard-coded defaults (which are specific to each version and each distro, by the way):
oozie.launcher.yarn.app.mapreduce.am.resource.mb
oozie.launcher.yarn.app.mapreduce.am.command-opts
(to align the max heap size with the global memory max)oozie.launcher.mapreduce.map.memory.mb
oozie.launcher.mapreduce.map.java.opts
(...)oozie.launcher.mapreduce.job.queuename
(in case you've got multiples queues with different priorities)And I have no idea how they do that. Maybe there's a generic YARN config somewhere, maybe it's a Cloudera-specific feature.