Search code examples
consulnomad

nomad woes: NOMAD_IP_myport is my external interface IP, even though I'm using network.mode = "bridge"


I've set up a nomad cluster on 2 VPS, one with nomad server, consul server and vault, and another with nomad and consul client.

On the client node I'm trying to run nomad jobs, e.g. a postgres service that should be reachable by other containers on an internal network. So I chose bridge mode like so:

    group "myservices" {
        count = 1

        network {
            mode = "bridge"
            port "postgrestcp" {
                to = 5432 
            }
        }

        service {
            name = "svc-postgres"
            port = "postgrestcp"
            tags = ["postgres","primary"]
   ...
        task "task-postgres" {
            driver = "docker"

            config {
                image = "docker.io/postgres:16-alpine"
                ports = ["postgrestcp"]
            }
   ...

When I start the job and exec into the started container (nomad exec -task task-postgres -t $(nomad job status myjob-postgres | grep -A2 Alloc | tail -n1 | awk '{ print $1 }') bash)

I find that all relevant NOMAD_ADDR... env vars in the container contain my external interface IP instead of the IP of the default bridge (i.e. with the name nomad). That bridge exists and has an IP which I expect to find in the mentioned env vars.

> networkctl status nomad | grep -e Type -e '^\s*Address:'
                          Type: bridge
                       Address: 172.26.64.1

No matter what I try I can't get rid of nomad referring to the public IP. (I first thought it was because I was using the podman task driver before, but changing to docker didn't make a difference).

What's going on here?


Solution

  • In Nomad network specification, "bridge" network mode mean that there will be created a bridge between the tasks in the group. This is not "bridge" docker interface, it means something different. See https://developer.hashicorp.com/nomad/docs/networking#bridge-networking.

    Nomad is redirecting the interface that was configured in the Nomad client service configuration. See https://developer.hashicorp.com/nomad/docs/configuration/client#host_network-block and https://developer.hashicorp.com/nomad/docs/job-specification/network#host_network . The default interface is the one used to connect, i.e. "external" one. I typically add also host_network "lo" { interface = "lo" } to have localhost. Consider using firewall to prevent external stuff from reaching your services. Typically, having many servers, you want it to be an external port, so that other services can reach it.

    Other containers can connect to the service using the IP and port it was assigned to. Within the group in the job use ${NOMAD_ADDR_postgrescp} to connect to it. If not within the group, i.e. from other tasks and anything else, you have to extract that information from Nomad. You can:

    In our network, we have fabio & consul. For HTTP services, we just pick a name, add it to urlprefix-name.service.consul.our.domain to consul service tags within a job, and connect to the URL in the browser. For other services, most commonly templating is used to generate other services configuration, like airflow connecting to postgres or redis.