Search code examples
hadoopoozie

Where to put oozie.launcher.* configuration?


While trying to use Oozie properly, I ended up setting a few parameters, namely:

  • oozie.launcher.mapreduce.map.memory.mb
  • oozie.launcher.mapreduce.map.java.opts
  • oozie.launcher.yarn.app.mapreduce.am.resource.mb
  • oozie.launcher.mapred.job..queue.name

If I set them in the worfklow configuration, they work as expected.

Is there a way/a place to set them globally, ie. not per workflow? I was expecting that custom-oozie-site.xml would be the right place but apparently not (they have no effect if put there). Is the workflow itself the only place where they can be configured?

If it is relevant, I am using hdp 2.5.


Solution

  • In the Oozie Parameterization of Workflows section of the documentation, they state

    Workflow applications may define default values for the workflow job parameters. They must be defined in a config-default.xml file bundled with the workflow application archive... Workflow job properties have precedence over the default values.

    Another option I've seen done is defining a parent workflow definition and propagating to child workflows. Granted, this only works in specific instances and isn't always a good idea.

    In addition the documentation notes in the Workflow Deployment section

    The config-default.xml file defines, if any, default values for the workflow job parameters. This file must be in the Hadoop Configuration XML format. EL expressions are not supported and user.name property cannot be specified in this file. Any other resources like job.xml files referenced from a workflow action action node must be included under the corresponding path, relative paths always start from the root of the workflow application.

    This is a problem my team is currently trying to fix across 12 different ETL loads.