Search code examples
javascriptnode.jsinternet-explorercorspunycode

Decoding HTTP headers with Unicode characters in Node.js


I've got an Express server running with the following cors middleware config:

app.use(
  cors({
    origin: [
      /^http:\/\/localhost:\d+/,
      /^https:\/\/щоденниквражень\.укр/,
      /^https:\/\/xn--80adfecflqzagb7a3ioc\.xn--j1amh/,
    ],
  }),
);

(xn--80adfecflqzagb7a3ioc.xn--j1amh is the Punycode representation of щоденниквражень.укр)

I have made requests to https://api.щоденниквражень.укр from a page hosted at https://щоденниквражень.укр. Most browsers send the Punycode representation in the Origin header, which works as expected.

But IE11 sends the raw https://щоденниквражень.укр. It is supposed to match the second regex in the list, but on the server side I get the following header value from req.headers.origin:
Origin: https://Ñ Ð¾Ð´ÐµÐ½Ð½Ð¸ÐºÐ²ÑаженÑ.ÑкÑ
which, obviously, fails to match any of the regexes (some characters may have been displayed incorrectly but you got the idea - the charset is wrong).

Is it possible to fix this issue? I guess I should probably set the encoding - but I don't know where to do it and which one to choose. Any help is appreciated!


Solution

  • First, the problem is not the charset. For some reason Node.js can't deal with Cyrillic characters and they get decoded incorrectly. I didn't find the proper solution to this problem, so I will be more than happy if anyone posts it here :)

    But I have a workaround. I found the website https://dom.hastin.gs/files/utf8/# which can fix my Origin value and make it https://щоденниквражень.укр. I checked out its source code in DevTools and it uses some library file unicode.min.js (strangely, I haven't found its GitHub repo or source code). Here is a link to that library: https://dom.hastin.gs/files/utf8/unicode.min.js (in case it ever breaks, I made a backup on Google Drive: https://drive.google.com/file/d/1erDSjdEQL5tOAvodeaVdHfnx7CvKApmn/view?usp=sharing)

    Now I can use the library in my code like this to convert the Origin string:

    // Load Cyrillic characters
    // Check out `Unicode.blocks` for a list of available blocks,
    // then call `Unicode.load(<START>, <END>)`
    Unicode.load(1024, 1279);
    
    // Fix the string
    Unicode.fix('https://щоденниквражень.укр'); // Returns 'https://щоденниквражень.укр'
    

    I know this isn't the proper solution, but it gets things done and I hope it will be helpful for anyone who stumbles across this issue. In fact, it's a more general problem: handling non-ASCII characters in HTTP headers in Node.js - not strictly related to CORS.

    Update: I have run the library code through a beautifier and studied the code of it. The author did a really good job, but, in my opinion, specifically for the purpose of decoding HTTP headers it is somewhat of an overkill. There are lots of opportunities to improve performance and reduce complexity, so I recommend everyone who wants to use this library to take a look at the code and refactor it to better fit your specific use case - which is what I did. I am happy with the result and I think it can be declared as a good solution to the problem