Search code examples
browserurl-encodingaddress-bar

url-encoding in browser address bar


When I put some non-alpha-numeric symbols in browser address bar, they got url-encoded. For example, https://www.php.net/manual-lookup.php?pattern=привет turns into https://www.php.net/manual-lookup.php?pattern=%EF%F0%E8%E2%E5%F2.

The question is: what do those two percent-prefixed hex digits mean?


Solution

  • they are bytes of the Windows 1251 encoding of Cyrillic. Since there are only six of them, they can't be UTF-8, since it takes 12 bytes of UTF-8 for 6 chars of Cyrillic.

    The code chart for CP1251 can be found here: http://en.wikipedia.org/wiki/Windows-1251.

    Just like 20 is hex for a space, each of the Cyrillic characters has its numeric value, expressible as two hex digits.