Search code examples
google-cloud-platformgoogle-cloud-load-balancer

Google Cloud Load Balancer suddenly throwing 504 timeouts (despite no change in architecture)


Problem

Sometime overnight, my service began throwing 504 errors on longer running (30+ second) requests despite no recent changes in architecture.

Setup

  • GCP Cloud Run configured with 3600 second timeout
  • GCP Load Balancer with serverless NEG pointing to Cloud Run

Troubleshooting

If I hit the Cloud Run-generated URL directly, I can successfully execute 30+ second requests. If I instead hit the public URL (and thus the load balancer), I get timeouts on longer requests.

Musings

Looking at Google Cloud release notes, there was a change to Load Balancing, but nothing related to timeouts: https://cloud.google.com/release-notes

The current setup has been working flawlessly for over a year.

Update

Modifying the backend timeout is disabled for serverless NEGs, see below: screenshot

Update 2

This seems like a bug with GCP load balancing introduced during the last update, as the default timeout should be 60 minutes, not 30 seconds, as per the documentation: screenshot


Solution

  • Since the issue was related to our prod environment, we have created new load balancer, this time we picked global (classic) LB as an option, got new IP address, and swapped old one at our dns provider. After that everything works. Will probably delete previous LB. I am aware this is not a fix to the problem but more of a workaround, but hey, we got production working and exporting data like before.