Search code examples
perlunicodestringreverse

How can I reverse a string that contains combining characters in Perl?


I have the string "re\x{0301}sume\x{0301}" (which prints like this: résumé) and I want to reverse it to "e\x{0301}muse\x{0301}r" (émusér). I can't use Perl's reverse because it treats combining characters like "\x{0301}" as separate characters, so I wind up getting "\x{0301}emus\x{0301}er" ( ́emuśer). How can I reverse the string, but still respect the combining characters?


Solution

  • The best answer is to use Unicode::GCString, as Sinan points out


    I modified Chas's example a bit:

    • Set the encoding on STDOUT to avoid "wide character in print" warnings;
    • Use a positive lookahead assertion (and no separator retention mode) in split (doesn't work after 5.10, apparently, so I removed it)

    It's basically the same thing with a couple of tweaks.

    use strict;
    use warnings;
    
    binmode STDOUT, ":utf8";
    
    my $original = "re\x{0301}sume\x{0301}";
    my $wrong    = reverse $original;
    my $right    = join '', reverse split /(\X)/, $original;
    
    print <<HERE;
    original: [$original]
       wrong: [$wrong]
       right: [$right]
    HERE