Is there a way to get XML::Twig
to understand a UTF-16-encoded XML file?
The code to read the file is what was stated in the tutorials:
use warnings;
use strict;
use XML::Twig;
# ...
my $twig=XML::Twig->new(
twig_handlers => { ... },
prety_print => 'indented',
keep_encoding => 1,
};
# ...
$twig->parsefile('myXmlFile.xml'); # <= line 71
Error is:
error parsing tag '<RIBBON>' at /usr/lib/perl5/vendor_perl/5.14/x86_64-cygwin-threads/XML/Parser/Expat.pm line 470
at ../../cv32/res/convert-xml-string2.pl line 71
at ../../cv32/res/convert-xml-string2.pl line 71
The XML starts off like so:
<?xml version="1.0" encoding="utf-16"?>
Changing my opening code as Borodin suggests, it still doesn't work:
# parse the XML file
open(my $xmlIn, '<:encoding(UTF-16)', $xmlFile) or die "Couldn't open xml file '$xmlFile'. $!";
$twig->parse($xmlIn); # <= line 72
The error becomes:
encoding specified in XML declaration is incorrect at line 1, column 30, byte 30 at /usr/lib/perl5/vendor_perl/5.14/x86_64-cygwin-threads/XML/Parser.pm line 187
at ../../cv32/res/convert-xml-string2.pl line 72
Apparently, the XML parser used by XML::Twig (XML::Parser) doesn't support UTF-16. You need to convert the XML document to a supported encoding (e.g. UTF-8) first.
For example,
use XML::LibXML qw( );
my $xml;
{
open(my $fh, '<:raw', $qfn)
or die $!;
local $/;
$xml = <$fh>;
}
{
my $doc = XML::LibXML->new()->parse_string($xml);
$doc->setEncoding('UTF-8');
$xml = $doc->toString();
}
$twig->parse($xml);
A lighter solution would be to detect/expect UTF-16, decode the document (using Encode's decode
), use a regex to adjust the encoding declaration, then encoding the document (using Encodes encode
).