Search code examples
dnsconsuldnsmasq

Consul endpoint doesnt resolve, nslookup works but not ping


I dont have a systemd-resolved, I have installed dnsmasq. nslookup shows concerned consul servers and master tagged server. but doesnt show replica tagged servers or ping any of consul domains.

I have systemd 219 and OL7. Currently node01 is the master. I am trying to ping or connect psql from within the same consul server.

bash-4.2$ psql -U postgres -h master.postgres-cluster.service.consul -p 6432
    psql: could not translate host name "master.postgres-cluster.service.consul" to address: Name or service not known
    bash-4.2$ consul members
    Node         Address             Status  Type    Build   Protocol  DC   Partition  Segment
    node1  node1_ip:8301  alive   server  1.14.4  2         dc1  default    <all>
    node2  node2_ip:8301  alive   server  1.14.4  2         dc1  default    <all>
    node3     node3_ip:8301  alive   server  1.14.3  2         dc1  default    <all>
    
bash-4.2$ dig @127.0.0.1 -p 8600 consul.service.consul
    
; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.8 <<>> @127.0.0.1 -p 8600 consul.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45170
;; flags: qr aa rd; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;consul.service.consul.         IN      A

;; ANSWER SECTION:
consul.service.consul.  0       IN      A       node2_ip
consul.service.consul.  0       IN      A       node3_ip
consul.service.consul.  0       IN      A       node1_ip

;; Query time: 0 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Tue Feb 14 17:46:38 GMT 2023
;; MSG SIZE  rcvd: 98

bash-4.2$ dig @127.0.0.1 -p 8600 postgres-cluster.service.consul

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.8 <<>> @127.0.0.1 -p 8600 postgres-cluster.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31501
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;postgres-cluster.service.consul. IN    A

;; ANSWER SECTION:
postgres-cluster.service.consul. 0 IN   A       node1_ip

;; Query time: 0 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Tue Feb 14 17:46:46 GMT 2023
;; MSG SIZE  rcvd: 76

bash-4.2$ dig @127.0.0.1 -p 8600 consul.service.consul

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.8 <<>> @127.0.0.1 -p 8600 consul.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31321
;; flags: qr aa rd; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;consul.service.consul.         IN      A

;; ANSWER SECTION:
consul.service.consul.  0       IN      A       node1_ip
consul.service.consul.  0       IN      A       node3_ip
consul.service.consul.  0       IN      A       node2_ip

;; Query time: 0 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Tue Feb 14 18:02:48 GMT 2023
;; MSG SIZE  rcvd: 98

bash-4.2$ dig @127.0.0.1 -p 8600 postgres-cluster.service.consul

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.8 <<>> @127.0.0.1 -p 8600 postgres-cluster.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31556
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;postgres-cluster.service.consul. IN    A

;; ANSWER SECTION:
postgres-cluster.service.consul. 0 IN   A       node1_ip

;; Query time: 0 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Tue Feb 14 18:03:06 GMT 2023
;; MSG SIZE  rcvd: 76

bash-4.2$ ping master.postgres-cluster.service.consul
ping: master.postgres-cluster.service.consul: Name or service not known
bash-4.2$ dig @127.0.0.1 -p 8600 replica.postgres-cluster.service.consul

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.8 <<>> @127.0.0.1 -p 8600 replica.postgres-cluster.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 35176
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;replica.postgres-cluster.service.consul. IN A

;; AUTHORITY SECTION:
consul.                 0       IN      SOA     ns.consul. hostmaster.consul. 1676397839 3600 600 86400 0

;; Query time: 0 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Tue Feb 14 18:03:59 GMT 2023
;; MSG SIZE  rcvd: 118

bash-4.2$ dig @127.0.0.1 -p 8600 master.postgres-cluster.service.consul

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.8 <<>> @127.0.0.1 -p 8600 master.postgres-cluster.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20545
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;master.postgres-cluster.service.consul.        IN A

;; ANSWER SECTION:
master.postgres-cluster.service.consul. 0 IN A  node1_ip

;; Query time: 0 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Tue Feb 14 18:04:18 GMT 2023
;; MSG SIZE  rcvd: 83

bash-4.2$ cat /etc/resolv.conf
search  ourdomain.com
nameserver      our_ns1
nameserver      our_ns2
nameserver      127.0.0.1

If I dont specify name server in nslookup, it still fails;

bash-4.2$ nslookup consul.service.consul
Server:         ns1
Address:        ns1#53

** server can't find consul.service.consul: NXDOMAIN
bash-4.2$  nslookup postgres-cluster.service.consul 127.0.0.1 -port=8600
Server:         127.0.0.1
Address:        127.0.0.1#8600

Name:   postgres-cluster.service.consul
Address: pgnod01_ip (current master)

bash-4.2$  nslookup consul.service.consul 127.0.0.1 -port=8600
Server:         127.0.0.1
Address:        127.0.0.1#8600

Name:   consul.service.consul
Address: pgnode03_ip
Name:   consul.service.consul
Address: pgnode02_ip
Name:   consul.service.consul
Address: pgnode01_ip

Solution

  • The resolv.conf file will only use another nameserver only if the query times out.

    If there are multiple servers, the resolver library queries them in the order listed. If no nameserver entries are present, the default is to use the name server on the local machine. (The algorithm used is to try a name server, and if the query times out, try the next, until out of name servers, then repeat trying all the name servers until a maximum number of retries are made.)

    Given your use case, I would only keep the dnsmasq server in the resolv.conf file and update the configuration for dnsmasq to handle the queries using your custom nameservers.

    Example:

    resolv.conf

    cat /etc/resolv.conf
    search  ourdomain.com
    nameserver      127.0.0.1
    

    dnsmasq configuration:

    server=/consul/127.0.0.1#8600
    server=our_ns1
    server=our_ns2