Search code examples
iisutf-8isapiwfastcgi

IIS incorrectly decodes URLs containing characters outside the system locale


It seems that IIS incorrectly delivers the request URL to a web application if the URL contains UTF-8 encoded characters, which are not supported by the current system locale. All "unsupported" characters are replaced by question marks ('?').

Example: The system locale is set to Norwegian. The following URL works fine:

/myapp/Blåbærsyltetøy/

The following URL does not work:

/myapp/черничный-джем/

In both URLs, non-ASCII characters are encoded as UTF-8 and then percent-encoded, so the actual URLs look like this:

/myapp/Bl%C3%A5b%C3%A6rsyltet%C3%B8y/
/myapp/%D1%87%D0%B5%D1%80%D0%BD%D0%B8%D1%87%D0%BD%D1%8B%D0%B9-%D0%B4%D0%B6%D0%B5%D0%BC/

The application uses two ways of handling requests:

  • wfastcgi + Python
  • ISAPI + C++

Both are suffering from the same problem, and both have no problem if the URL only contains characters that are supported by the system locale.

In the case of ISAPI, it looks like EXTENSION_CONTROL_BLOCK::lpszPathInfo already delivers a percent-decoded URL, where all "unsupported" characters have been replaced by question marks. The EXTENSION_CONTROL_BLOCK::lpszPathInfo attribute is a multi-byte character string, and there is no wide-character string version of this structure.

Is there a way to get the original, percent-encoded URL or prevent IIS from decoding URLs to work around the problem?


Solution

  • Solution for ISAPI

    Get the request URL from the server variable HTTP_URL rather than PATH_INFO. This delivers the original, percent-encoded URL, which can then be decoded correctly (by percent-decoding to an array of bytes and interpreting that array of bytes as an UTF-8-encoded string).

    This variable contains the query string and the original path before URL rewriting, which may be unwanted, so it may need some extra processing.

    Also, for error handler requests, this variable contains a string in a format similar to

    <DLL_PATH>?<STATUS_CODE>;<ORIGINAL_HTTP_URL>
    

    which needs to be parsed. But it contains all the information that PATH_INFO contains, except without incorrect decoding.

    Note: Getting Path_INFO using GetServerVariable, rather than from the EXTENSION_CONTROL_BLOCK structure does not solve the encoding problem.

    Solution for wfastcgi

    Server variables are encoded using the system locale (called 'mbcs' in Python) by default. This behavior can be changed by setting a registry key:

    reg add HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\w3svc\Parameters /v FastCGIUtf8ServerVariables /t REG_MULTI_SZ /d REQUEST_URI\0PATH_INFO
    

    Note that this will affect all wfastcgi applications on the same server and may break existing applications which do not expect variables to be UTF-8-encoded (rather unlikely, as any sane application that uses non-ASCII URLs would use UTF-8 encoding...).

    See also https://support.microsoft.com/en-us/help/2277918/fix-a-php-application-that-depends-on-the-request-uri-server-variable