Search code examples
iisazure-web-app-servicereverse-proxyarr

Setting up a reverse proxy in Azure App Service to point requests from subdirectory to subdomain


I have a WordPress website located at https://blog.example.com and another site hosted separately in Azure App Service (Windows) at https://www.example.com. Cloudflare sits in front of both of these sites.

I have set up a reverse proxy that points requests from https://www.example.com/blog to https://blog.example.com. This appears to be mostly working in that the blog posts appear under the expected URL (i.e. https://www.example.com/blog/a-blog-post), but there are a few peculiarities that make me think something is not set up quite right:

  • When submitting certain forms in the WordPress admin dashboard (e.g. general settings), it signs the user out of the session and redirects to the sign in page (a URL parameter references blog.example.com)
  • Using pagination in the Wordpress admin dashboard redirects to the page but on https://blog.example.com
  • There are a handful of pages in the Wordpress admin dashboard where there are console errors indicating that something coludn't be loaded from https://blog.example.com
  • When I add a 301 redirect from https://blog.example.com -> https://www.example.com/blog, it goes into an infinite redirect loop.

From my reading, I believe all of these problems are occurring because when the request is processed by server hosting WordPress, the Host header is https://blog.example.com rather than https://www.example.com. There are several places (e.g. here) where WordPress uses the Host header to construct certain URLs, rather than the WordPress Website URL or Home URL (both set to https://www.example.com/blog). Microsoft recommends preserving the original host header to resolve these problems.

Application Request Routing (ARR) on IIS has a preserveHostHeader option that presumably be used to have the original host header be retained. I've tried enabling this but the proxy stops working entirely:

  • Visiting https://www.example.com/blog (the root of the blog) shows me the https://www.example.com homepage
  • Visiting https://www.example.com/blog/a-blog-post shows me a 404 (generated by the site at https://www.example.com)

Here is my existing set up:

applicationHost.xdt (to enable ARR on Azure App Service as it's disabled by default)

<configuration xmlns:xdt="http://schemas.microsoft.com/XML-Document-Transform">
    <system.webServer>
        <proxy xdt:Transform="InsertIfMissing" enabled="true" preserveHostHeader="false" reverseRewriteHostInResponseHeaders="false"/>
        <rewrite xdt:Transform="InsertIfMissing">
            <allowedServerVariables xdt:Transform="InsertIfMissing">
                <add name="HTTP_X_ORIGINAL_HOST" xdt:Transform="InsertIfMissing" xdt:Locator="Match(name)"/>
                <add name="HTTP_X_UNPROXIED_URL" xdt:Transform="InsertIfMissing" xdt:Locator="Match(name)"/>
                <add name="HTTP_X_ORIGINAL_ACCEPT_ENCODING" xdt:Transform="InsertIfMissing" xdt:Locator="Match(name)"/>
                <add name="HTTP_ACCEPT_ENCODING" xdt:Transform="InsertIfMissing" xdt:Locator="Match(name)"/>
            </allowedServerVariables>
        </rewrite>
    </system.webServer>
</configuration>

web.config (to rewrite requests from subdirectory -> subdomain)

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <location path="." inheritInChildApplications="false">
    <system.webServer>
      <handlers>
        <add name="aspNetCore" path="*" verb="*" modules="AspNetCoreModuleV2" resourceType="Unspecified" />
      </handlers>
      <aspNetCore processPath="dotnet" arguments=".\Example.dll" stdoutLogEnabled="false" stdoutLogFile=".\logs\stdout" hostingModel="inprocess" />

      <rewrite>
        <rules>
          <clear />
          <rule name="Blog Proxy" stopProcessing="false">
            <match url="^blog(?:$|/)(.*)" />
            <action type="Rewrite" url="https://blog.example.com/{R:1}" appendQueryString="true" logRewrittenUrl="false" />
            <serverVariables>
              <set name="HTTP_X_UNPROXIED_URL" value="https://blog.example.com/{R:1}" />
              <set name="HTTP_X_ORIGINAL_ACCEPT_ENCODING" value="{HTTP_ACCEPT_ENCODING}" />
              <set name="HTTP_X_ORIGINAL_HOST" value="{HTTP_HOST}" />
              <set name="HTTP_ACCEPT_ENCODING" value="" />
            </serverVariables>
          </rule>
        </rules>
      </rewrite>
    </system.webServer>
  </location>
</configuration>

This appears to be a pretty standard setup for reverse proxies, but alas. Is this because I'm running behind Cloudflare? Does preserveHostHeader not work with Azure App Services? How can I set this reverse proxy up so that it handles my use case?


Solution

  • The issue turned out to be how the web hosts had set up their environment.

    The way the server is configured, when a request comes in for a specific hostname, the vhost files is checked for that hostname. When a matching one is found, content is loaded from the DocRoot set in the vhost file.

    In my situation, there was no vhost file configured to handle requests for www.example.com. In this instance, the default catchall vhost file would then forward this request to the catchall DocRoot on the server and so when we appended /blog at the end, the path didn't exist for the catchall DocRoot so it's returning a 404. Adding www.example.com as a pointer domain on the account told the server which DocRoot to serve requests out of for that hostname appeared to fix the issue I was dealing with regarding the 404 response.

    I think perhaps the 'pointer domain' concept is unique to this particular web host, this question or answer may not be that applicable generally. Hopefully it gives somewhere to look if you're experiencing the same thing though.