In the Date::Holidays::DK
module, the names of certain Danish holidays are written in Latin1 encoding. For example, January 1st is 'Nytårsdag'. What should I do to $x
below in order to get a proper utf8-encoded string?
use Date::Holidays::DK;
my $x = is_dk_holiday(2011,1,1);
I tried various combinations of use utf8
and no utf8
before/after use Date::Holidays::DK
, but it does not seem to have any effect. I also triede to use Encode's decode
, with no luck. More specifically,
use Date::Holidays::DK;
use Encode;
use Devel::Peek;
my $x = decode("iso-8859-1",
is_dk_holiday(2011,1,1)
);
Dump($x);
print "January 1st is '$x'\n";
gives the output
SV = PV(0x15eabe8) at 0x1492a10
REFCNT = 1
FLAGS = (PADMY,POK,pPOK,UTF8)
PV = 0x1593710 "Nyt\303\245rsdag"\0 [UTF8 "Nyt\x{e5}rsdag"]
CUR = 10
LEN = 16
January 1st is 'Nyt sdag'
(with an invalid character between t and s).
use utf8 and no utf8 before/after use Date::Holidays::DK, but it does not seem to have any effect.
Correct. The utf8
pragma only indicates that the source code of the program is written in UTF-8.
I also tried to use Encode's decode, with no luck.
You did not perceive this correctly, you in fact did the right thing. You now have a string of Perl characters and can manipulate it.
with an invalid character between t and s
You also interpret this wrong, it is in fact the å
character.
You want to output UTF-8, so you are lacking the encoding step.
my $octets = encode 'UTF-8', $x;
print $octets;
Please read http://p3rl.org/UNI for the introduction to the topic of encoding. You always must decode and encode, either explicitely or implicitely.