Search code examples
c#iisasp.net-coreiis-10

dotnet process inside IIS crashes. How can I determine cause and remediate?


We are testing an ASP.net core application which crashes while running the dotnet.exe process inside the "reverse proxy" system that IIS automatically uses, if you have installed the IIS .net core hosting option into your machine.

Once the dotnet processes (five in our case) have crashed, all requests return 502 errors, and IIS does not recover by relaunching the dotnet processes.

There is nothing in the IIS logging (in event viewer) to indicate why the pool is crashing, and sometimes the IIS admin UI shows that the pool is "stopped" and sometimes that it is "started", but in task manager, all dotnet.exe processes have gone away.

We're going to bisect (rebuild from previous commits of our code) until we isolate it, but this question is mostly about the ISS and DotNet reverse proxy:

  • Is there anywhere else than Event viewer (eventvwr.msc) that could contain relevant logging on the IIS reverse proxy and the dotnet.exe host process? There are no current items in event viewer that seem relevant. There are messages for days other than the days we have had the crashes, that show WAS starting and restarting IIS pools, but none when the dotnet processes have just gone away.

iis messages in event viewer

  • Is there any known bug in dotnet runtime causing it to crash, and IIS does not recover and restart the process?

  • The only logging I see from IIS is the GET/POST requests, no useful information in C:\inetpub\logs\LogFiles\...

Update: Starting today I now see some relevant event viewer entries: Failed to start process with commandline '"dotnet" .\MyCompany.Authorization.Api.dll', ErrorCode = '0x80004005'. from Source: IIS AspNetCore Module


Solution

  • It seems that IIS itself is crashing in this case, not the dotnet process inside my app pool. My evidence is that the log files in C:\inetpub\logs\LogFiles\W3SVC1 show this, which looks like IIS itself is crashing and being restarted. Note the "#" at the beginning of what looks like some initial bringup/startup logging:

    2016-08-03 13:30:09 W3SVC1 192.168.215.35 GET /api/v3/fhir/patient/311 - 80 - 192.168.215.164 HTTP/1.1 python-requests/2.10.0 - dev-html5.mycompany.biz 404 0 2 1405 1099 0
    2016-08-03 13:31:19 W3SVC1 192.168.215.35 OPTIONS /api/v3/fhir/code _dc=1470231079494 80 - 192.168.225.119 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 http://localhost:8086/ dev-html5.mycompany.biz 200 0 0 206 544 109
    2016-08-03 13:31:21 W3SVC1 192.168.215.35 OPTIONS /api/v3/fhir/imagingstudy/297 _dc=1470231081268 80 - 192.168.225.119 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 http://localhost:8086/ dev-html5.mycompany.biz 200 0 0 206 556 312
    2016-08-03 13:31:33 W3SVC1 192.168.215.35 OPTIONS /api/v3/fhir/code _dc=1470231093098 80 - 192.168.225.119 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 http://localhost:8086/ dev-html5.mycompany.biz 200 0 0 206 544 53
    2016-08-03 13:31:34 W3SVC1 192.168.215.35 OPTIONS /api/v3/fhir/imagingstudy/297 _dc=1470231094714 80 - 192.168.225.119 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 http://localhost:8086/ dev-html5.mycompany.biz 200 0 0 206 556 59
    #Software: Microsoft Internet Information Services 8.5
    #Version: 1.0
    #Date: 2016-08-03 13:33:03
    #Fields: date time s-sitename s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version cs(User-Agent) cs(Referer) cs-host sc-status sc-substatus sc-win32-status sc-bytes cs-bytes time-taken
    2016-08-03 13:33:03 W3SVC1 192.168.215.35 GET /api/v3/fhir/organization/1 - 80 - 192.168.215.35 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.3;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 - dev-html5.mycompany.biz 502 5 0 1627 1255 519
    2016-08-03 13:33:10 W3SVC1 192.168.215.35 GET /api/v3/authorize/login username=admin&password=mycompanyx 80 - 192.168.215.164 HTTP/1.1 python-requests/2.10.0 - dev-html5.mycompany.biz 502 5 0 1627 210 520
    2016-08-03 13:33:23 W3SVC1 192.168.215.35 GET /api/v3/fhir/organization/1 - 80 - 192.168.215.35 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.3;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 - dev-html5.mycompany.biz 502 5 0 1627 1255 523
    #Software: Microsoft Internet Information Services 8.5
    #Version: 1.0
    #Date: 2016-08-03 13:34:06
    #Fields: date time s-sitename s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version cs(User-Agent) cs(Referer) cs-host sc-status sc-substatus sc-win32-status sc-bytes cs-bytes time-taken
    2016-08-03 13:34:06 W3SVC1 192.168.215.35 OPTIONS /api/v3/fhir/linkage/patient/315 _dc=1470231246063 80 - 192.168.225.30 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 http://localhost:8086/ dev-html5.mycompany.biz 502 5 0 1627 511 548
    #Software: Microsoft Internet Information Services 8.5
    #Version: 1.0
    #Date: 2016-08-03 13:34:18
    #Fields: date time s-sitename s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version cs(User-Agent) cs(Referer) cs-host sc-status sc-substatus sc-win32-status sc-bytes cs-bytes time-taken
    2016-08-03 13:34:18 W3SVC1 192.168.215.35 GET /api/v3/fhir/organization/1 - 80 - 192.168.215.35 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.3;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 - dev-html5.mycompany.biz 502 5 0 1627 1255 531
    2016-08-03 13:34:41 W3SVC1 192.168.215.35 GET /api/v3/authorize/login username=admin&password=mycompanyx 80 - 192.168.215.164 HTTP/1.1 python-requests/2.10.0 - dev-html5.mycompany.biz 502 5 0 1627 210 531
    2016-08-03 13:34:46 W3SVC1 192.168.215.35 OPTIONS /api/v3/fhir/code _dc=1470231282672 80 - 192.168.225.119 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 http://localhost:8086/ dev-html5.mycompany.biz 502 5 0 1627 544 777
    2016-08-03 13:34:46 W3SVC1 192.168.215.35 OPTIONS /api/v3/fhir/imagingstudy/297 _dc=1470231284575 80 - 192.168.225.119 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 http://localhost:8086/ dev-html5.mycompany.biz 502 5 0 1627 556 1037
    2016-08-03 13:34:57 W3SVC1 192.168.215.35 GET /api/v3/fhir/organization/1 - 80 - 192.168.215.35 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.3;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 - dev-html5.mycompany.biz 502 5 0 1627 1221 530
    2016-08-03 13:35:00 W3SVC1 192.168.215.35 OPTIONS /api/v3/fhir/code _dc=1470231299728 80 - 192.168.225.119 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 http://localhost:8086/ dev-html5.mycompany.biz 502 5 0 1627 544 527
    

    I am going to try to make a reproducible test base but for now this appears to be related to:

    1. IIS on Windows Server 2012 R2 with the dotnet IIS hosting update installed, and all the latest windows updates.

    2. IIS is hosting multiple applications in a single application pool. This decision (by me) to host multiple applications in a single pool may be "not recommended" by Microsoft due to known problems (like this), but I'll update this answer when I have better information.

    Also it seems that IIS Failed Request Tracing (FREB) might be useful when IIS is falling over. See this article.

    Update: I set up an Azure VM and was able to reproduce, and provide details to Microsoft. Results of that research are posted on IISIntegration Github issue 245 here.. A workaround for the particular issue I was experiencing was suggested to me by someone at Microsoft, the idea of the workaround is:

    • In deployed IIS dotnet core application change relative web.config launch filename to absolute filename, from, for example <aspNetCore processPath="dotnet" arguments=".\yourapp.dll" ... /> to <aspNetCore processPath="dotnet" arguments="c:\full\path\to\yourapp.dll" ... />, and to make code changes to Startup.cs and Program.cs that check when environment is Release, and if so, always get the application current working directory as the web root, instead of the provided Environment web root.

    • At runtime, after each deploy, manually re-apply web.config changes because you can't modify the web.config source (input) files in your project, only the deployed final binary content (after doing dotnet publish).