We are running OpenShift Origin 1.4 (OSE 3.4) and had the CA cert for etcd expire over the weekend. The cluster appears to still be functioning. However, I'm guessing its a ticking time bomb. Which leads me to my question. Does anyone know of a safe way to update the certificates?
I've seen the link below, but it appears to be for valid certs that are going to expire. I have a feeling it will fail as soon as any service is restarted since the cert is expired.
https://docs.openshift.org/latest/install_config/redeploying_certificates.html
I resolved this issue yesterday morning. Here's a full description of the situation and what I did to resolve it in case anyone with the same problem sees this.
We are running an OpenShift origin 1.4 cluster that was originally installed as 1.1 and has been upgraded through all the versions over the past year. Last Saturday our CA, server, and peer certs for etcd expired. This caused a number of errors to be thrown in our server logs but the etcd and openshift cluster continued running. However, when I caused the same situation in our dev environment and restarted the services, the etcd nodes refused to connect to each other and the openshift cluster would not start.
If you are in the same situation, do NOT restart etcd or your master services unless you have a gameplan for fixing the issue and are ready to do so.
The OpenShift docs for redeploying certificates states that using the redeploy-certificates.yaml playbook does not regenerate any CA certificates. I tested this out in our dev environment and confirmed that it does not regenerate the etcd CA certificate. Neither does the redeploy-etcd-certificates.yaml playbook. Which means you have to run the redeploy-openshift-ca.yml playbook and then the redeploy-certificates.yml playbook to resolve the issue. In the end, you would have all new certificates for everything in the cluster. I was pretty sure this would take a significant amount of time and potentially cause an outage when redeploy-openshift-ca tried to restart etcd and saw expired server and peer certs.
To fix the issue I found the command used in the redeploy-openshift-ca.yaml playbook that generates the etcd CA cert and ran that manually. After that, I ran the redeploy-etcd-certificates.yaml playbook.
cd /etc/etcd/ca/
export SAN=etcd-signer
openssl req -config openssl.cnf -newkey rsa:4096 -keyout ca.key \
-new -out ca.crt -x509 -extensions etcd_v3_ca_self -batch \
-nodes -days 1825 -subj /CN=etcd-signer@`date +%s`
ansible-playbook -i hosts_file -vv \
/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-etcd-certificates.yml
The redeploy-etcd-certificates playbook failed at trying to restart the first etcd node because the two other nodes were still running with the expired certificates. To resolve this I manually restarted services for all three etcd nodes and everything came up properly. I then re-ran the redeploy-etcd-certificates playbook for good measure. It completed properly the second time and our environment is happy again.
@aleks thanks for the help.