Search code examples
dnsdockermesosmarathon

Setup Mesos-DNS dockerized on a mesos cluster


I'm facing some trouble trying to run mesos-dns dockerized on a mesos cluster.

I've setup 2 virtual machines with ubuntu trusty on a windows 8.1 host. My VMs are called docker-vm and docker-sl-vm; where the first one runs mesos-master and the 2nd one runs mesos-slave.

The VMs have 2 network cards; one running NAT for accesing internet through the host and the other one is a Host-only adapter for internal communication.

The IPs for the VMs are:

  • 192.168.56.101 for docker-vm
  • 192.168.56.102 for docker-sl-vm

The MESOS cluster is running Okay.

I am trying to follow this tutorial. So, I am running mesos-dns with the following marathon description:

{
    "args": [
        "/mesos-dns",
        "-config=/config.json"
    ],
    "container": {
        "docker": {
            "image": "mesosphere/mesos-dns",
            "network": "HOST"
        },
        "type": "DOCKER",
        "volumes": [
            {
                "containerPath": "/config.json",
                "hostPath": "/usr/local/mesos-dns/config.json",
                "mode": "RO"
            }
        ]
    },
    "cpus": 0.5,
    "mem": 256,
    "id": "mesos-dns",
    "instances": 1,
    "constraints": [["hostname", "CLUSTER", "docker-sl-vm"]]
}

and this config.json:

{
    "zk": "zk://192.168.56.101:2181/mesos",
    "refreshSeconds": 60,
    "ttl": 60,
    "domain": "mesos",
    "port": 53,
    "resolvers": ["8.8.8.8"],
    "timeout": 5,
    "email": "root.mesos-dns.mesos"
}

I am also running a test proposal application called peek with the following description:

{
  "id": "peek",
  "cmd": "env >env.txt && python3 -m http.server 8080",
  "cpus": 0.5,
  "mem": 32.0,
  "container": {
    "type": "DOCKER",
    "docker": {
      "image": "python:3",
      "network": "BRIDGE",
      "portMappings": [
        { "containerPort": 8080, "hostPort": 0 }
      ]
    }
  }
}

PROBLEM

Into the tutorial, a dig command such as dig _peek._tcp.marathon.mesos SRV got the following answer:

; <<>> DiG 9.9.5-3ubuntu0.5-Ubuntu <<>> _peek._tcp.marathon.mesos SRV
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57329
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; QUESTION SECTION:
;_peek._tcp.marathon.mesos. IN  SRV

;; ANSWER SECTION:
_peek._tcp.marathon.mesos. 60   IN  SRV 0 0 31000 peek-27346-s0.marathon.mesos.

;; ADDITIONAL SECTION:
peek-27346-s0.marathon.mesos. 60 IN A   10.141.141.10

;; Query time: 4 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sat Oct 24 23:21:15 UTC 2015
;; MSG SIZE  rcvd: 160

Where we can clearly see the port and IP bound to _peek._tcp.marathon.mesos SRV, BUT when I run this on my slave machine - which is running this container - I get this result:

docker@docker-sl-vm:~$ dig _peek._tcp.marathon.mesos SRV

; <<>> DiG 9.9.5-3ubuntu0.5-Ubuntu <<>> _peek._tcp.marathon.mesos SRV
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 33415
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1280
;; QUESTION SECTION:
;_peek._tcp.marathon.mesos. IN  SRV

;; AUTHORITY SECTION:
.           10791   IN  SOA a.root-servers.net. nstld.verisign-grs.com. 2015102801 1800 900 604800 241

;; Query time: 1 msec
;; SERVER: 10.10.11.1#53(10.10.11.1)
;; WHEN: Wed Oct 28 17:06:30 BRT 2015
;; MSG SIZE  rcvd: 129

It looks like mesos-dns can't resolve _peek._tcp.marathon.mesos SRV.

Does anyone know why and how to fix it?

Thank you in advance...

UPDATE

Result of command /etc/resolv.conf :

nameserver 10.10.11.1
nameserver 10.10.10.7

Solution

  • Have a look at the Mesos DNS docs regarding Slave Setup:

    To allow Mesos tasks to use Mesos-DNS as the primary DNS server, you must edit the file /etc/resolv.conf in every slave and add a new nameserver. For instance, if mesos-dns runs on the server with IP address 10.181.64.13, you should add the line nameserver 10.181.64.13 at the beginning of /etc/resolv.conf on every slave node.

    I think the local IP (192.168.56.102) address is missing in your /etc/resolv.conf.

    Otherwise, you can also try my minimal Mesos DNS image, but you'd still have to edit the above file.