Search code examples
hadoopdockerhdfsmesosmarathon

How to mount HDFS in a Docker container


I made an application Dockerized in a Docker container. I intended to make the application able to access files from our HDFS. The Docker image is to be deployed on the same cluster where we have HDFS installed via Marathon-Mesos.

Below is the json to be POST to Marathon. It seems that my app is able to read and write files in the HDFS. Can someone comment on the safety of this? Would files changed by my app correctly changed in the HDFS as well? I Googled around and didn't find any similar approaches...

{
  "id": "/ipython-test",
  "cmd": null,
  "cpus": 1,
  "mem": 1024,
  "disk": 0,
  "instances": 1,
  "container": {
    "type": "DOCKER",
    "volumes": [
      {
        "containerPath": "/home",
        "hostPath": "/hadoop/hdfs-mount",
        "mode": "RW"
      }
    ],
    "docker": {
      "image": "my/image",
      "network": "BRIDGE",
      "portMappings": [
        {
          "containerPort": 8888,
          "hostPort": 0,
          "servicePort": 10061,
          "protocol": "tcp",
        }
      ],
      "privileged": false,
      "parameters": [],
      "forcePullImage": true
    }
  },
  "portDefinitions": [
    {
      "port": 10061,
      "protocol": "tcp",
      "labels": {}
    }
  ]
}

Solution

  • You might have a look at the Docker volume docs.

    Basically, the volumes definition in the app.json would trigger the start of the Docker image with the flag -v /hadoop/hdfs-mount:/home:RW, meaning that the host path gets mapped to the Docker container as /home in read-write mode.

    You should be able to verify this if you SSH into the node which is running the app and do a docker inspect <containerId>.

    See also