I have a Perl 5.30.0 program on Ubuntu where the combination of File::Slurp
and open ':std', ':encoding(UTF-8)'
results in UTF8 not getting read correctly:
use strict;
use warnings;
use open ':std', ':encoding(UTF-8)';
use File::Slurp;
my $text = File::Slurp::slurp('input.txt');
print "$text\n";
with "input.txt" being an UTF8 encoded text file with this content (no BOM):
ö
When I run this, the ö
gets displayed as ö
. Only when I remove the use open...
line, it works as expected and the ö
is printed as an ö
.
When I manually read the file like below, everything works as expected and I do get the ö
:
$text = '';
open my $F, '<', "input.txt" or die "Cannot open file: $!";
while (<$F>) {
$text .= $_;
}
close $F;
print "$text\n";
Why is that and what is the best way to go here? Is the open
pragma outdated or am I missing something else?
As with many pragmas,[1] the effect of use open
is lexically-scoped.[2] This means it only affects the remainder of the block or file in which it's found. Such a pragma doesn't affect code in functions outside of its scope, even if they are called from which its scope.
You need to communicate the desire to decode the stream to File::Slurp. This can't be done using slurp
, but it can be done using read_file
via its binmode
parameter.
use open ':std', ':encoding(UTF-8)'; # Still want for effect on STDOUT.
use File::Slurp qw( read_file );
my $text = read_file('input.txt', { binmode => ':encoding(UTF-8)' });
A better module is File::Slurper.
use open ':std', ':encoding(UTF-8)'; # Still want for effect on STDOUT.
use File::Slurper qw( read_text );
my $text = read_text('input.txt');
File::Slurper's read_text
defaults to decoding using UTF-8.
Without modules, you could use
use open ':std', ':encoding(UTF-8)';
my $text = do {
my $qfn = "input.txt";
open( my $fh, '<', $qfn )
or die( "Can't open file \"$file\": $!\n" );
local $/;
<$fh>
};
Of course, that's not as clear as the earlier solutions.
use VERSION
, use strict
, use warnings
, use feature
and use utf8
.:std
is global.