Search code examples
perlunicodeslurp

perl: replacing all unicode x2103 (degree Celsius) in Perl6::Slurp-ed file


I am struggling with reading a UTF8 text file and replacing all occurrences of a unicode characters (degree Centigrade) with some other string.

#!/usr/bin/env perl
use 5.030;
use warnings;
use utf8;
use Perl6::Slurp;

my $s= "Hello.  This is 2.3℃ .";
$s =~ s/2.3/two-point-three /gms;
$s =~ s/\x{2103}/degrees celsius/gms;
print "STRING: '$s'\n";

my $fs= slurp("test.md");
$fs =~ s/2.3/two-point-three /gms;
$fs =~ s/\x{2103}/degrees celsius/gms;
print "FSYSTM: '$fs'";

and my test.md file reads just like the string.


Hello.  This is 2.3℃ .

Why is the output

STRING: 'Hello.  This is two-point-three degrees celsius .'
FSYSTM: 'Hello.  This is two-point-three ℃ .

Solution

  • You need to specify the file encoding when reading a non-ascii file using Perl6::Slurp:

    The following works for me:

    my $fs= slurp('<:utf8', "test.md");
    

    This will read the file and then decode the content from UTF8 to Unicode such that Perl can work with it as Unicode, see perluniintro and Perl6::Slurp documentation for more information.