Search code examples
perlencodingutf-8perl-modulelatin1

Perl: String literal in module in latin1 - I want utf8


In the Date::Holidays::DK module, the names of certain Danish holidays are written in Latin1 encoding. For example, January 1st is 'Nytårsdag'. What should I do to $x below in order to get a proper utf8-encoded string?

use Date::Holidays::DK;
my $x = is_dk_holiday(2011,1,1);

I tried various combinations of use utf8 and no utf8 before/after use Date::Holidays::DK, but it does not seem to have any effect. I also triede to use Encode's decode, with no luck. More specifically,

use Date::Holidays::DK;
use Encode;
use Devel::Peek;
my $x = decode("iso-8859-1", 
           is_dk_holiday(2011,1,1)
          );
Dump($x);
print "January 1st is '$x'\n";

gives the output

SV = PV(0x15eabe8) at 0x1492a10
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK,UTF8)
  PV = 0x1593710 "Nyt\303\245rsdag"\0 [UTF8 "Nyt\x{e5}rsdag"]
  CUR = 10
  LEN = 16
January 1st is 'Nyt sdag'

(with an invalid character between t and s).


Solution

  • use utf8 and no utf8 before/after use Date::Holidays::DK, but it does not seem to have any effect.

    Correct. The utf8 pragma only indicates that the source code of the program is written in UTF-8.

    I also tried to use Encode's decode, with no luck.

    You did not perceive this correctly, you in fact did the right thing. You now have a string of Perl characters and can manipulate it.

    with an invalid character between t and s

    You also interpret this wrong, it is in fact the å character.


    You want to output UTF-8, so you are lacking the encoding step.

    my $octets = encode 'UTF-8', $x;
    print $octets;
    

    Please read http://p3rl.org/UNI for the introduction to the topic of encoding. You always must decode and encode, either explicitely or implicitely.