I am repeating a question that I posted at https://forums.aws.amazon.com/thread.jspa?threadID=275855&tstart=0 to reach out more people.
Hi,
I am trying to deploy a REST service in AWS. The current architecture is: Domain name (Route 53) -> Load Balancer -> Single EC2 instance (bound to an Elastic IP). And I use TLS/SSL certificate issued by a Certificate Manager.
The instance is Ubuntu 16.04 machine, and the service is implemented with (bare) Vert.X (==no proxy server).
However, 504 Error (gateway timeout) occurs after a few different requests (each of which takes <1s) in a series, and then it does not respond. The requests do not reach the server instance after a few requests. I checked that it happens in the same way when I access both the domain name and the load balancer directly. I have confirmed that the exact same scenario is working with direct URL.
I run up a dummy server returning "hello world" and it's working okay with the load balancer. The problem should be caused by something no coherent between the load balancer and the server code, but I can't get where to start.
I have checked several threads complaining the 504 errors, and followed some of the instructions, but they do not work. Especially I set keep-alive option in Vert.x and set the idle time longer than the balancer's. As the delays are not longer than the idel time with the direct communication, I believe it is not the problem anyway. I have checked the Security Groups also and confirmed the right ports are open. (The first few requests are working, so it must not be the problem also.)
Does any of you have a sense where I should start looking at? Even better, know the source of the problem?
Thanks in advance.
EDIT: I just found the issue in some of the code. I've answered myself below. Thanks for reading!
Found the issue in my code. Some of the APIs (implemented by my colleague...) was not flushing the buffer of HTTP responses in the server.
In Vert.X Java, it was resp.end()
.
It was somehow working with direct access probably the buffer was flushed at some point, but that flush seems not caught by the load balancer.
Hope nobody experiences this, but in case...