Search code examples
perlutf8-decode

How do I decode a double backslashed PERLQQ escaped string into Perl characters?


I read lines from a file which contains semi-utf8 encoding and I wish to convert it to Perl-internal representation for further operations.

file.in (plain ASCII):

MO\\xc5\\xbdN\\xc3\\x81
NOV\\xc3\\x81

These should translate to MOŽNÁ and NOVÁ.

I load the lines and upgrade them to proper utf8 notation, ie. \\xc5\\xbd -> \x{00c5}\x{00bd}. Then I would like to take this upgraded $line and make perl to represent it internally:

for my $line (@lines) {
    $line =~ s/x(..)/x{00$1}/g;
    eval { $l = "$line"; };
}

Unfortunately, without success.


Solution

  • use File::Slurp qw(read_file);
    use Encode qw(decode);
    use Encode::Escape qw();
    
    my $string =
        decode 'UTF-8',             # octets → characters
        decode 'unicode-escape',    # \x → octets
        decode 'ascii-escape',      # \\x → \x
        read_file 'file.in';
    

    Read from the bottom upwards.