Search code examples
sslnginxgoogle-cloud-platformgoogle-compute-enginerhel7

Google Cloud Load Balancer - 502 - Unmanaged instance group failing health checks


I currently have an HTTPS Load Balancer setup operating with a 443 Frontend, Backend and Health Check that serves a single host nginx instance.

When navigating directly to the host via browser the page loads correctly with valid SSL certs.

When trying to access the site through the load balancer IP, I receive a 502 - Server error message. I check the Google logs and I notice "failed_to_pick_backend" errors at the load balancer. I also notice that it failing health checks.

Some digging around leads me to these two links: https://cloudplatform.googleblog.com/2015/07/Debugging-Health-Checks-in-Load-Balancing-on-Google-Compute-Engine.html

https://github.com/coreos/bugs/issues/1195

Issue #1 - Not sure if google-address-manager is running on the server (RHEL 7). I do not see an entry for the HTTPS load balancer IP in the routes. The Google SDK is installed. This is a Google-provided image and if I update the IP address in the console, it also gets updated on the host. How do I check if google-address-manager is running on RHEL7?

[root@server]# ip route ls table local type local scope host
10.212.2.40 dev eth0 proto kernel src 10.212.2.40
127.0.0.0/8 dev lo proto kernel src 127.0.0.1
127.0.0.1 dev lo proto kernel src 127.0.0.1

Output of all google services

[root@server]# systemctl list-unit-files
google-accounts-daemon.service                enabled
google-clock-skew-daemon.service              enabled
google-instance-setup.service                 enabled
google-ip-forwarding-daemon.service           enabled
google-network-setup.service                  enabled
google-shutdown-scripts.service               enabled
google-startup-scripts.service                enabled

Issue #2: Not receiving a 200 OK response. The certificate is valid and the same on both the LB and server. When running curl against the app server I receive this response.

root@server.com  curl -I https://app-server.com
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html

Thoughts?


Solution

  • A couple of updates and lessons learned:

    I have found out that "google-address-manager" is now deprecated and replaced by "google-ip-forward-daemon" which is running.

    [root@server ~]# sudo service google-ip-forwarding-daemon status
    Redirecting to /bin/systemctl status google-ip-forwarding-daemon.service
     google-ip-forwarding-daemon.service - Google Compute Engine IP Forwarding Daemon
       Loaded: loaded (/usr/lib/systemd/system/google-ip-forwarding-daemon.service; enabled; vendor preset: enabled)
       Active: active (running) since Fri 2017-12-22 20:45:27 UTC; 17h ago
     Main PID: 1150 (google_ip_forwa)
       CGroup: /system.slice/google-ip-forwarding-daemon.service
               └─1150 /usr/bin/python /usr/bin/google_ip_forwarding_daemon
    

    There is an active firewall rule allowing IP ranges 130.211.0.0/22 and 35.191.0.0/16 for port 443. The target is also properly set.

    Finally, the health check is currently using the default "/" path. The developers have put an authentication in front of the site during the development process. If I bypassed the SSL cert error, I received a 401 unauthorized when running curl. This was the root cause of the issue we were experiencing. To remedy, we modified nginx basic authentication configuration to disable authentication to a new route (eg. /health)

    Once nginx configuration was updated and the path was updated to the new /health route at the health check, we were receivied valid 200 responses. This allowed the health check to return healthy instances and allowed the LB to pass through traffic