Search code examples
hadoopoozie

Oozie Copy files from one hdfs location to another


I am using oozie fs move option to copy data from one hdfs folder to another. However if the target exists the fs command places source as a child of the target directory as expected. Is there a way to avoid this and copy only avro files from source to target.


Solution

  • if you have to overwrite a directory you can use -f example : hdfs dfs -cp -f /sourcepath /Destination path You can reflect the same way in oozie.

    If you want to only copy the Avro files pick the common extension for the avro files and use the wild care, something like this hdfs dfs -cp -f sourcepath/*.avro /Destnation_path

    There is no straight way to override the folder from oozie fs, you should delete the folder first and move that way you will not have the child directories

    <workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">
    ...
    <action name="[NODE-NAME]">
        <fs>
            <delete path='[PATH]'/>
            ...
            <mkdir path='[PATH]'/>
            ...
            <move source='[SOURCE-PATH]' target='[TARGET-PATH]'/>
            ...
            <chmod path='[PATH]' permissions='[PERMISSIONS]' dir-files='false' />
            ...
        </fs>
        <ok to="[NODE-NAME]"/>
        <error to="[NODE-NAME]"/>
    </action>
    ...
    

    Please refer the oozie documentation for more info on that.

    Hope this help, comment on the answer if you have any questions.