Search code examples
ruby-on-railsamazon-web-servicesnginxgoogle-compute-enginethin

Thin web server freezes / times out / stops responding periodically only on GCE not on AWS


I'm using Nginx to forward requests to two Thin web server processes on ports 5000 and 5001. Every once in a while one of the Thin processes will stop responding to requests and Nginx will spit out the following error.

2014/11/28 21:40:05 [error] 21516#0: *1458 upstream timed out (110: Connection timed out) while reading response header from upstream, client: X.X.X.X, server: www.X.com, request: "HEAD / HTTP/1.1", upstream: "http://127.0.0.1:5001/", host: "www.example.com", referrer: "http://www.example.com/"

Thin will go out for a couple of minutes and start responding again on its own. When that one Thin process is in a frozen state it will also not respond to wget (e.g. wget http://127.0.0.1:5000) nor something like a request from Python (e.g. requests.get('http://127.0.0.1:5000')).

I set up three machines: two on Google Compute Engine - Debian 7.7 and Ubuntu 14.04 - and one AWS instance - Ubuntu 14.04. This error only happens on Google Compute Engine - Amazon Web Services does not have the same problem.

The software on all machines is as close to identical as can be. All operating systems are completely up to date through apt-get and the project is pulled from the same Git commit. I use the same deployment method on all three machines and they are all using the same Google Cloud SQL service.

I'm using Thin 1.5.1, Ruby 1.9.3-p448, and Rails 3.2.11. Updating Thin to 1.6.3 did not make a difference.


Solution

  • I worked on this issue for far too long.

    The reason it was timing out was that Google Compute Engine closes idle TCP connections after 10 minutes. Thin's connection to the database was being axed by GCE and Thin would just hang on a read from the database. This would explain why AWS wasn't timing out even though it was connected to the same database.

    Using the suggested keep-alive settings in the previous link seems to fix the problem.