Search code examples
amazon-web-servicesamazon-ec2connection-timeout

Amazon AWS EC2 Instance intermittent timeouts


We have our java WEB application deployed on EC2 instance on Tomcat and accessed via Apache Web Server which sits behind HAProxy (3-tier architecture). Everything works fine but from time to time I can't connect to my application at all from my machine and I have to wait (or reboot) my machine before connectivity comes back. When this issue happens, I can't even SSH to the Bastion host. Just as if all the sudden the whole environment gone dead or offline. It happens from different locations (work, home), different times (day or night), different machines (windows, mac, ipad, iphone). We ran network diagnostic tools on our network at work and came out empty handed. No issues with the network. Rebooting the machine (or simply waiting) will fix the issue. When the issue start to happen on my machine, I am still able to access the Amazon AWS EC2 Console and I am able to verify that all instances are running OK. But still when I try to access the application URL from browser, I keep getting 'timeouts' and at the same time not able to SSH. But if I reboot my machine or just wait, connectivity get restored by itself!!!.

It is strange that it happened to many people at work at different intervals and sometimes I am able to connect to the web app but my work mate not able to (even we are running on the same network!!). This is starting to drive us crazy. We are still in our testing phase but we are moving closer and closer to the Go-Live date and we are now worried that our customers will be facing this intermittent issue as well.

Has anybody got any clue as to what might be causing this issue?


Solution

  • We found the reason behind the issue above. It turned out that there was a network config issue on the AWS. A firewall rule was restricting the range of ports that could be accessed externally. IP-ports are allocated randomly, so if we got a port in the correct range then we are able to access the service otherwise the access failed.

    Issue is now resolved and I hope this answer will help any one who might face similar issue in the future.