Search code examples
perlencodingcharacter-encodingmediawikiurl-encoding

MediaWiki API section names encoding


For [[Test#?]], I get "Test#.3F" from action=parse bit of MediaWiki API. What is this encoding and how do I bring it to human readable format using Perl's CPAN?

URI::Encode works for the percent decoding, but not the section names one.


Solution

  • It is UTF-8 percent-encoding, but with . instead of %, and spaces replaced with underscores; additionally, multiple consecutive whitespaces are collapsed, and : is preserved (not encoded into .3A).

    The exact code which handles it is Parser::guessSectionNameFromWikiText(), but if you do not want to dig through a lot of code, check the much simpler implementation in an older MediaWiki version (compatible except for a few edge cases), in anchorencode():

    str_replace( '%', '.', str_replace('+', '_', urlencode( $text ) ) );