I have some .html
files in a directory to which I want to add one line of css
code. Using perl
, I can locate the position with a regex and add the css
code, this works very well.
However, my first .html file contain an accented letter: é but the resulting .html
file has an encoding problem and prints: \xE9
In the perl file, I have been careful to specify UTF-8
encoding when opening and closing the files, has shown in the MWE below, but that does not solve the problem. How can I solve this encoding error?
MWE
use strict;
use warnings;
use File::Spec::Functions qw/ splitdir rel2abs /; # To get the current directory name
# Define variables
my ($inputfile, $outputfile, $dir);
# Initialize variables
$dir = '.';
# Open current directory
opendir(DIR, $dir);
# Scan all files in directory
while (my $inputfile = readdir(DIR)) {
#Name output file based on input file
$outputfile = $inputfile;
$outputfile =~ s/_not_centered//;
# Open output file
open(my $ofh, '>:encoding(UTF-8)', $outputfile);
# Open only files containning ending in _not_centered.html
next unless (-f "$dir/$inputfile");
next unless ($inputfile =~ m/\_not_centered.html$/);
# Open input file
open(my $ifh, '<:encoding(UTF-8)', $inputfile);
# Read input file
while(<$ifh>) {
# Catch and store the number of the chapter
if(/(<h2)(.*?)/) {
# $_ =~ s/<h2/<h2 style="text-align: center;"/;
print $ofh "$1 style=\"text-align: center;\"$2";
}else{
print $ofh "$_";
}
}
# Close input and output files
close $ifh;
close $ofh;
}
# Close output file and directory
closedir(DIR);
Problematic file named "Chapter_001_not_centered.html"
<html >
<head></head>
<body>
<h2 class="chapterHead"><span class="titlemark">Chapter 1</span><br /><a id="x1-10001"></a>Brocéliande</h2>
Brocéliande
</body></html>
This question found an answer in the commments of @Shawn and @ sticky bit:
By changing the encoding to open and close the files to ISO 8859-1, it solves the problem. If one of you wants to post the answer, I will validate it.