Search code examples
docker.net-coreamazon-ecsasp.net-core-2.1

Troubleshooting dotnet process in docker container


We are building a microservices platform based on dotnet core and docker. We are hosting this inside AWS using ECS running linux containers on linux hosts.

I am running into a problem with the dotnet processes stuck at 100% cpu on the host after a load test is finished while receiving no traffic. I have been trying to troubleshoot some performance issues related to this we have been having and I have done the following things:

  1. Made HttpClient a singleton so it can reuse connections
  2. Increased memory size of my containers from 128mb to 256mb (the dotnet containers wanted to grow larger than 128)

These updates have helped a bit, but I am still seeing strange behavior in the dotnet process running on the host. Locally this problem does not happen, I can run a load test, and while the test is running cpus are high, but once its done they go back down. On the EC2 host the processes say at 100% for a few minutes after.

Has anyone experienced things like this, or have any ideas on how to troubleshoot this? I have tried looking at process information on the host, but can't see much.

Here is an example of what my machine looks like after the load tests have finished but the server is at 100% cpu:

top and docker stats on host

----------- Edit 2018-10-01 -----------

I ran the load tests with the dotnet logging level set to debug, here are the results:

    TIME 18:45:45 - Last requests goes through

dbug: Microsoft.AspNetCore.Server.Kestrel[9]
Connection id "0HLH7PMRMMAFR" completed keep alive response.
info: Microsoft.AspNetCore.Hosting.Internal.WebHost[2]
Request finished in 25.9946ms 200 application/json; charset=utf-8
Date=2018-10-01T18:45:45&Service=user&RequestTime=157&PortalId=56&Path=/user/56/v1/user&Method=POST&Action=POST user/{portalid}/v1/user&IPAddress=_IP_&ApiKey=__Key&ResponseCode=200&RequestBody=_BodyData_&Response=_responseData_&ContainerId=f7e4bf541a31&RequestId=
dbug: Microsoft.AspNetCore.Server.Kestrel[9]
Connection id "0HLH7PMRMMAFP" completed keep alive response.
info: Microsoft.AspNetCore.Hosting.Internal.WebHost[2]
Request finished in 160.5087ms 200 application/json; charset=utf-8

    TIME 18:46:01 - 18:47:45  See some HealthCheck requests come in

dbug: Microsoft.AspNetCore.Server.Kestrel[1]
Connection id "0HLH7PMRMMAFV" started.
dbug: Microsoft.AspNetCore.Server.Kestrel[1]
Connection id "0HLH7PMRMMAG0" started.
info: Microsoft.AspNetCore.Hosting.Internal.WebHost[1]
Request starting HTTP/1.1 GET http://10.0.1.73:32800/apigateway/0/v1/info
info: Microsoft.AspNetCore.Hosting.Internal.WebHost[2]
Request finished in 0.0899ms 200 application/json
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAG0" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAG0" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAG0" stopped.
info: Microsoft.AspNetCore.Hosting.Internal.WebHost[1]
Request starting HTTP/1.1 GET http://10.0.1.73:32800/apigateway/0/v1/info
info: Microsoft.AspNetCore.Hosting.Internal.WebHost[2]
Request finished in 0.056ms 200 application/json
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAFV" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAFV" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAFV" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel[1]

    TIME: 18:47:45 - 18:47:47 - Connections finally are closed?

dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAGA" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAFN" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAFM" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAFM" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAFN" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAFM" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAFN" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAFK" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAFL" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAFU" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAFK" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAFU" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAFL" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAFK" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAFU" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAFL" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAFS" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAFS" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAFS" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAFO" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAFP" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAFR" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAFQ" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel[10]
Connection id "0HLH7PMRMMAFT" disconnecting.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAFP" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAFO" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAFR" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAFQ" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Connection id "0HLH7PMRMMAFT" sending FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAFP" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAFO" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAFR" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAFQ" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Connection id "0HLH7PMRMMAFT" stopped.

At 18:47:47 the cpu finally went back down. Looks like the problem is with Kestrel keeping connections alive for two minutes and while it does this, CPUs are maxed out.

How should I address? I could look into not respecting the Keep-Alive header on the response: Connection id "0HLH7PMRMMAFR" completed keep alive response. But shouldn't Kestrel continue to reuse this connection instead of creating a new one?

I think I can't reproduce this locally because it must be the AWS ALB inserting a keep alive header?


Solution

  • I think I found the problem!

    https://github.com/aspnet/KestrelHttpServer/issues/2694

    Updated to 2.1.4 and it went away. Its good to keep in mind that if you are using the newest version of a framework always check for any new updates and bugfixes :)