Search code examples
apacheamazon-web-servicesamazon-ec2timeoutstrace

Apache threads stops in state ..reading.. until http.conf value of TimeOut is reached


When I check the webserver mod_status /server-status I noticed that there a bunch of threads in state ..reading..

Doing a strace on a thread this is what actually happens when the thread is in ..reading..

...
...
semop(327681, {{0, 1, SEM_UNDO}}, 1)    = 0    
gettimeofday({1452260985, 867058}, NULL) = 0
getsockname(156, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("172.31.9.248")}, [16]) = 0
fcntl(156, F_GETFL)                     = 0x2 (flags O_RDWR)
fcntl(156, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
gettimeofday({1452260985, 867479}, NULL) = 0
read(156, 0x558f4c26e9d8, 8000)         = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=156, events=POLLIN}], 1, 300000) = 1 ([{fd=156, revents=POLLIN}])
read(156, "", 8000)                     = 0
gettimeofday({1452261254, 669634}, NULL) = 0
gettimeofday({1452261254, 669691}, NULL) = 0
shutdown(156, SHUT_WR)                  = 0
poll([{fd=156, events=POLLIN}], 1, 2000) = 1 ([{fd=156, revents=POLLIN|POLLHUP}])
read(156, "", 512)                      = 0
close(156)                              = 0
read(6, 0x7fff901f67e7, 1)              = -1 EAGAIN (Resource temporarily unavailable)
gettimeofday({1452261254, 670341}, NULL) = 0    
semop(327681, {{0, -1, SEM_UNDO}}, 1)   = 0
...
...

When the thread are in ..waiting.. the strace stops at the following line:

poll([{fd=156, events=POLLIN}], 1, 300000) = 1 ([{fd=156, revents=POLLIN}])

The apache config value of "Timout" is set to 30 in this case that reflects the value "300000". This is the timeout value it waits, lowering the configuration value will change the value shown above and it will make the timeout faster.

From my new knowledge in using strace it looks to me that it tries to get a socket to lookup a internal IP-adress. But that is not successful. The setting "HostnameLookups" is off.

Checking our production environment shows that it has the same patterns when Apache stops in ..reading.. but then it shows a IPV6 address pattern.

Example:

getsockname(154, {sa_family=AF_INET6, sin6_port=htons(80), inet_pton(AF_INET6, "::ffff:172.31.3.239", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0

And then stops at "poll" and then get the "(Timeout)" as in the example above.

But is there some input why it stops in ..waiting.. ?

Does the "Resource temporarily unavailable" message leave any clue?

If it matters, Apache is running on EC2 instances behind a ELB in Amazon cloud.

Update: Here is a image of how a Production server looks right now with thread states. Lots of ..reading.. Image of Apache thread states

We also are running lots of VirtualHosts on the servers if that gives any clue why this happens.

Closest thread on the World Wide Web I fund with the same problem is this one: http://apache-http-server.18135.x6.nabble.com/Apache-Hangs-Server-Status-shows-all-Reading-td4751342.html


Solution

  • The threads stuck in ..reading.. was caused by a mismatch of "Idle Timeout" on connection settings in ELB compared to the setting of Keepalivetimout in http.conf

    The connection timeout set in ELB was a lot longer than the Keepalivetimout set in Apache configuration. This results in that the Elastic Load Balancer will try to keep open connections while Apache want's to close it. See here http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/config-idle-timeout.html

    After changing the ELB settings to match the settings in Apache config (60 seconds currently) gives the result that I have not got a long list of threads stick in state of R (Reading). They are now set in state K (Keep alive). And this looks more like the expected behavior of threads.