Search code examples
apache-sparkoozie

Spark and remote properties-files


I try to launch a oozie which launches spark. I need to specify it a properties-files . But this properties files has to be on hdfs

spark-submit --properties-files hdfs:/user/lele/app.properties ....

doesn't work. Do you have any idea to resolve this issue. Thanks


Solution

  • Straight from the Oozie documentation for Spark extension

    Spark Action Schema Version 0.2
    ...
       <xs:element name="file" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>

    (looks like it was forgotten in V0.1 ?!? that was a blunder...)

    And in the Oozie documentation for core Oozie Workflow features

    The file, archive elements make available, to map-reduce jobs, files and archives ... Files specified with the file element, will be symbolic links in the home directory of the task.
    Refer to Hadoop distributed cache documentation for details more details on files and archives.

    Unfortunately that's just noise, and does not explain what file actually does: it downloads an HDFS file into the YARN container(s) running the Oozie action, and makes it available in the Current Working Dir.
    In option, you can get the file renamed e.g. <file>/user/dummy/wtf.conf.V5.2.0#wtf.conf</file> will fetch a specific version on HDFS and make it available to the job under a generic name.



    Recommended reading: the Hooked on Hadoop tutorial series about Oozie. Now a bit old, but still the best overview of what Oozie can do in practise.