Search code examples
asp.net-coreasp.net-core-webapireverse-proxy.net-5

Upgrading from .NET Core 3.1 to .NET 5.0 causes ReverseProxy (YARP) requests to fail with HTTP status 400


We have a .NET Core 3.1 application that uses Microsoft's YARP ReverseProxy, using version Preview 8. Our application is a backend-for-frontend(BFF) that hosts a ReactJS SPA, ties into IdentityServer (IS5), and uses the reverse proxy to hit our various APIs. The BFF runs on an IIS server and connects to the IS5 and the other APIs by going through a firewall and load balancer.

When we upgraded our application to .NET 5 we noticed that all of the API requests were failing with a response error code of 400, BadRequest. We tried upgrading the reverse proxy to Preview 10 but the errors continued to occur. A few other things we've tried to do are:

  • Configured YARP to only use HTTP/1.1. Made this change since the load balancer(LB) is a TLS terminating endpoint and all requests going to the APIs would be over HTTP and not HTTPS. HTTP/2 requires TLS and we weren't sure how the LB was handling the conversion. Later saw in the logs that there were downgrade requests for HTTP calls so believe the LB does downgrade the HTTP/2 to HTTP/1.1
  • Added a transform to handle the response headers during a conversion from HTTP/2 to HTTP/1.1 Did this based on issue in their repo 583

Here is the resulting proxy configuration.

{
   "ReverseProxy": {
      "Routes": [
         {
            "RouteId": "route_api",
            "ClusterId": "cluster_api",
            "Match": {
               "Path": "/api/{*remainder}"
            },
            "Transforms": [
               {
                  "PathRemovePrefix": "/api"
               },
               {
                  "ResponseHeader": "Connection",
                  "Set": "",
                  "When": "Always"
               },
               {
                  "RequestHeadersCopy": "true"
               }
            ]
         },
         {
            "RouteId": "route_odata",
            "ClusterId": "cluster_odata",
            "Match": {
               "Path": "/odata/{*remainder}"
            },
            "Transforms": [
               {
                  "PathRemovePrefix": "/odata"
               },
               {
                  "ResponseHeader": "Connection",
                  "Set": "",
                  "When": "Always"
               },
               {
                  "RequestHeadersCopy": "true"
               }
            ]
         }
      ],
      "Clusters": {
         "cluster_api": {
            "Destinations": {
               "route_api/destination1": {
                  "Address": "https://api.domain.com/api/v1/",
                  "Version": "1.1"
               }
            }
         },
         "cluster_odata": {
            "Destinations": {
               "cluster_odata/destination1": {
                  "Address": "https://api.domain.com/odata/v1/",
                  "Version": "1.1"
               }
            }
         }
      }
   }
}

Given that we are telling YARP that the destination is going to be HTTP/1.1 we probably don't need the extra transform that sets the Connection response header.

I'm hoping we can get some logs from the firewall/LB since I'm guessing the 400 errors are coming from there. In looking at the IIS and application logs for the APIs we don't see the request ever hitting the APIs. Has anyone else run into this issue and found a fix for it?

I figured out that the .NET 3.1 build only has calls to the API controllers working. Any call to an OData endpoint fails with HTTP status code 400. In .NET 5 all controllers, API or OData, fail equally.

I tried using Ocelot Reverse Proxy and saw the same issues so now believing it isn't a problem with the proxy/gateway libraries. Decided to configure our deployed services to permit requests from the dev environment and had an interesting result. Running the site from IIS Express, via Visual Studio 2019, everything works. When the site is deployed to my local instance of IIS and I run it from there the OData queries being to fail again. Now looking into IIS configuration and modules to see what might be causing the problem.


Solution

  • Well in the end the problem resided with a Load Balancer that resided between the YARP client and the backend API. Since the requests directly against the API were working while the ones through the proxy failed we decided to log all of the headers that were being sent by the proxy to the API. There were a total of 27 headers with the largest two being the Cookie that was getting forwarded from the front-end request and the Authorization header. Here are the steps we took to debug the issue.

    1. Copied these headers into a Postman request and tried sending them over to an endpoint that didn't require any authentication. The request failed.
    2. Tried removing the Cookie header since that one was only needed back the back-end-for-front-end to handle Identity management. The request worked.
    3. Tried removing the Authorization header but added the Cookie header back in and the request worked again.
    4. We then tried adding a bogus header that we continuously increased the size of to see if it would eventually stop working. After 1000+ characters our API call stopped working again.
    5. The next step was to remove one character from a random header and send the request over to the api. That worked.
      With these tests we determined that the load balancer was throwing away HTTP requests once they reached a certain total header size. Since the API doesn't have any use with the Cookie header we removed it and another header that served no purpose. This fixed the problem.

    As the original post title states the issue seems to lie with going from .NET 3.1 to 5.0 I don't think this is really the case. One other tidbit we saw was that the Authorization header and Cookie header value had a large range in size depending on who and when you were trying to use the service. I think what we were seeing was that in some cases the size of those headers were small enough to pass through the load balancer but then when our session was refreshed the size increased.

    TLDR; The amount of data being sent over in the headers was causing the HTTP requests to be rejected by the load balancer.