Search code examples
perlutf-8

Why does File::Slurp get UTF8 characters wrong when I use open ':std', ':encoding(UTF-8)';?


I have a Perl 5.30.0 program on Ubuntu where the combination of File::Slurp and open ':std', ':encoding(UTF-8)' results in UTF8 not getting read correctly:

use strict;
use warnings;
use open ':std', ':encoding(UTF-8)';
use File::Slurp;

my $text = File::Slurp::slurp('input.txt');
print "$text\n";

with "input.txt" being an UTF8 encoded text file with this content (no BOM):

ö

When I run this, the ö gets displayed as ö. Only when I remove the use open... line, it works as expected and the ö is printed as an ö.

When I manually read the file like below, everything works as expected and I do get the ö:

$text = '';
open my $F, '<', "input.txt" or die "Cannot open file: $!";
while (<$F>) {
    $text .= $_;
}
close $F;
print "$text\n";

Why is that and what is the best way to go here? Is the open pragma outdated or am I missing something else?


Solution

  • As with many pragmas,[1] the effect of use open is lexically-scoped.[2] This means it only affects the remainder of the block or file in which it's found. Such a pragma doesn't affect code in functions outside of its scope, even if they are called from which its scope.

    You need to communicate the desire to decode the stream to File::Slurp. This can't be done using slurp, but it can be done using read_file via its binmode parameter.

    use open ':std', ':encoding(UTF-8)';  # Still want for effect on STDOUT.
    use File::Slurp qw( read_file );
    
    my $text = read_file('input.txt', { binmode => ':encoding(UTF-8)' });
    

    A better module is File::Slurper.

    use open ':std', ':encoding(UTF-8)';  # Still want for effect on STDOUT.
    use File::Slurper qw( read_text );
    
    my $text = read_text('input.txt');
    

    File::Slurper's read_text defaults to decoding using UTF-8.


    Without modules, you could use

    use open ':std', ':encoding(UTF-8)';
    
    my $text = do {
       my $qfn = "input.txt";
       open( my $fh, '<', $qfn )
          or die( "Can't open file \"$file\": $!\n" );
       local $/;
       <$fh>
    };
    

    Of course, that's not as clear as the earlier solutions.


    1. Other notable examples include use VERSION, use strict, use warnings, use feature and use utf8.
    2. The effect on STDIN, STDOUT and STDERR from :std is global.