Search code examples
dockerworkflowoozieoozie-workflow

Can Apache Oozie run docker containers?


Currently comparing DAG-based workflow tools like Airflow and Luigi for scheduling generic docker containers as well as Spark jobs.

Can Apache Oozie run generic Docker containers through its shell action? Or is Oozie strictly meant for Hadoop tools like Pig and Hive?

Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).


Solution

  • I've tried to run Docker containers through Shell action and it's working. Since Shell action can be executed on any node of the cluster, Docker must be installed on any node.

    workflow.xml created from Hue

    <workflow-app name="Test docker" xmlns="uri:oozie:workflow:0.5">
        <start to="shell-5c29"/>
        <kill name="Kill">
            <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
        </kill>
        <action name="shell-5c29">
            <shell xmlns="uri:oozie:shell-action:0.1">
                <job-tracker>${jobTracker}</job-tracker>
                <name-node>${nameNode}</name-node>
                <exec>test_docker.sh</exec>
                <file>/test_docker.sh#test_docker.sh</file>
            </shell>
            <ok to="End"/>
            <error to="Kill"/>
        </action>
        <end name="End"/>
    </workflow-app>
    

    test_docker.sh

    docker run hello-world > output.txt
    hdfs dfs -put -f output.txt /output.txt
    echo 'done'
    

    Content of output.txt generated

    Hello from Docker!
    This message shows that your installation appears to be working correctly.
    
    To generate this message, Docker took the following steps:
     1. The Docker client contacted the Docker daemon.
     2. The Docker daemon pulled the "hello-world" image from the Docker Hub (amd64)
     3. The Docker daemon created a new container from that image which runs the executable that produces the output you are currently reading.
     4. The Docker daemon streamed that output to the Docker client, which sent it to your terminal.
    
    To try something more ambitious, you can run an Ubuntu container with:
     $ docker run -it ubuntu bash
    
    Share images, automate workflows, and more with a free Docker ID:
     https://hub.docker.com/
    
    For more examples and ideas, visit:
     https://docs.docker.com/get-started/