Let's see you have specified the second argument of open function like this:
open my $fh, ">:via(File::BOM):encoding(UTF-8)", $file or die "Cannot open $file: $!";
Here you specify the :via(File::BOM)
first and :encoding(UTF-8)"
,
Then, the output string will be cut in the middle.
Following script attempts to output the UTF-8 text file with BOM, The contents should be concatenation of strings from "xxx yyyy1" to "xxx yyyy100", incrementing the trailing number, and delimitted with " / ".
#!/usr/bin/perl
# bomTest.pl
use strict;
use warnings;
use File::BOM;
use feature 'say';
my $file = '/home/cf/Desktop/foo.txt';
open my $fh, ">:via(File::BOM):encoding(UTF-8)", $file or die "Cannot open $file: $!";
say $fh 'xxx yyyy1 / xxx yyyy2 / xxx yyyy3 / xxx yyyy4 / xxx yyyy5 / xxx yyyy6 / xxx yyyy7 / xxx yyyy8 / xxx yyyy9 / xxx yyyy10 / ' .
'xxx yyyy11 / xxx yyyy12 / xxx yyyy13 / xxx yyyy14 / xxx yyyy15 / xxx yyyy16 / xxx yyyy17 / xxx yyyy18 / xxx yyyy19 / xxx yyyy20 / ' .
... snip ...
'xxx yyyy71 / xxx yyyy72 / xxx yyyy73 / xxx yyyy74 / xxx yyyy75 / xxx yyyy76 / xxx yyyy77 / xxx yyyy78 / xxx yyyy79 / xxx yyyy80 / ' .
'xxx yyyy81 / xxx yyyy82 / xxx yyyy83 / xxx yyyy84 / xxx yyyy85 / xxx yyyy86 / xxx yyyy87 / xxx yyyy88 / xxx yyyy89 / xxx yyyy90 / ' .
'xxx yyyy91 / xxx yyyy92 / xxx yyyy93 / xxx yyyy94 / xxx yyyy95 / xxx yyyy96 / xxx yyyy97 / xxx yyyy98 / xxx yyyy99 / xxx yyyy100';
close $fh;
The output file foo.txt will be will be UTF-8 file with the BOM (0x EF BB BF) at the top of the file, but the string will be terminated at the middle as below:
xxx yyyy1 / xxx yyyy2 / xxx yyyy3 / xxx yyyy4 / xxx yyyy5 / xxx yyyy6 / xxx yyyy7 / xxx yyyy8 / xxx yyyy9 / xxx yyyy10 / xxx yyyy11 / xxx yyyy12 / xxx yyyy13 / xxx yyyy14 / xxx yyyy15 / xxx yyyy16 / xxx yyyy17 / xxx yyyy18 / xxx yyyy19 / xxx yyyy20 / ...snip... xxx yyyy71 / xxx yyyy72 / xxx yyyy73 / xxx yyyy74 / xxx yyyy75 / xxx yyyy76 / xxx yyyy77 / xxx yyyy78 / xxx yyyy79 / xxx yy
The output stops in the middle of "xxx yyyy80".
Now if you change the script like this:
open my $fh, ">:encoding(UTF-8):via(File::BOM)", $file or die "Cannot open $file: $!";
The changed point is the order of I-O Layer. You specified :encoding(UTF-8) first and :via(File::BOM) last.
Then the script run completely to "xxx yyyy100".
What is this phenomenon?
Is it the bug of Perl encoding?
Or else the one of File::BOM
module?
Or is it the reasonable specification of them?
The answer is already in the question: You used ">:via(File::BOM):encoding(UTF-8)"
when you should have used ">:encoding(UTF-8):via(File::BOM)"
.
From the docs,
Add the via(File::BOM) layer on top of a unicode encoding layer to print a BOM at the start of the output file. This needs to be done before any data is written. The BOM is written as part of the first print command on the handle, so if you don't print anything to the handle, you won't get a BOM.
It's not clear which layer is on top of which other layer, but the example makes it clear.
# Writing little-endian UTF-16 file with BOM: open(HANDLE, '>:encoding(UTF-16LE):via(File::BOM)', $filename)
The bug is yours. This is a case of GIGO.