Search code examples
dockerdnsconsulpki

Hashicorp Consul: invalid certificate, certificate is not valid for server.dc1.consul.hello.com


I am building a development environment in docker, using multiple containers with docker-compose. In docker, I have my own private certificate Authority (CA) server too. Each container has its own CERT signed by my private CA. Like this:

  • ds.hello.com.crt (directory service)
  • am.hello.com.crt (access manager)
  • consul.hello.com.crt
  • my-service-1.hello.com.crt
  • ...

I do not want to change this domain name structure.

The docker container name and the hostname of the consul server is consul.hello.com. But when I start consul, it complains about the certificate this way:

[ERROR] agent.server.rpc: failed to read byte: conn=from=127.0.0.1:56931 error="remote error: tls: bad certificate"
[WARN]  agent: [core][Channel #1 SubChannel #5] grpc: addrConn.createTransport failed to connect to {Addr: "dc1-127.0.0.1:8300", ServerName: "agent-one", }. Err: connection error: desc = "transport: Error while dialing: tls: failed to verify certificate: x509: certificate is valid for consul.hello.com, not server.dc1.consul.hello.com"
[WARN]  agent: error getting server health from server: server=agent-one error="rpc error getting client: failed to get conn: tls: failed to verify certificate: x509: certificate is valid for consul.hello.com, not server.dc1.consul.hello.com"
[ERROR] agent.server.rpc: failed to read byte: conn=from=127.0.0.1:59375 error="remote error: tls: bad certificate"

This is the relevant part of my Consul config file:

{
  "domain": "hello.com",
  "bootstrap": true,
  "server": true,   
  "log_level": "debug",
  "datacenter": "consul",
  "encrypt": "f502FprfDugOXiWqZUJpyAwgeXH+6qs6VNFjPxX8TgU=",
  "bind_addr": "0.0.0.0",                                   
  "client_addr": "0.0.0.0",                                 
  "node_name": "agent-one",                                 
  "data_dir": "/tmp/consul-data",
  "leave_on_terminate": false,   
  "skip_leave_on_interrupt": true,
  "ui_config": {                  
    "enabled": true               
  },                              
  "addresses": {   
    "https": "0.0.0.0"
  },                  
  "ports": {          
    "http": -1,       
    "https": 8501
  },             
  "tls": {       
    "defaults": {
      "key_file": "/tmp/consul.hello.com.plain-key",
      "cert_file": "/tmp/consul.hello.com.crt",     
      "ca_file": "/tmp/ca.crt",                     
      "verify_incoming": true,                      
      "verify_outgoing": true,                 
      "verify_server_hostname": true
    }                                
  }                                  
}               

But this config does not work. I can overwrite the domain name and the datacenter name in the config but it is still not enough:

failed to verify certificate: x509: 
   certificate is valid for consul.hello.com, not server.consul.hello.com"

The consul still insists on the server prefix.

I require to keep verify_server_hostname set to true.

What is the way to remove the server prefix from consul?


Solution

  • Consul requires SAN in the certificate and that was not set in my cert request.

    I use OpenVPN easy-rsa to generate the cert, so the solution in my case is adding the following param:

    --subject-alt-name="DNS:server.$FQDN"

    where FQDN=consul.hello.com in my environment.