Search code examples
perlutf-8cyrillicwritetofile

Cyrillic symbols shown strangеly when writing to a file


I have a class that has a string field input which contains UTF-8 characters. My class also has a method toString. I want to save instances of the class to a file using the method toString. The problem is that strange symbols are being written in the file:

my $dest = "output.txt";

print "\nBefore saving to file\n" . $message->toString() . "\n";

open (my $fh, '>>:encoding(UTF-8)', $dest) 
    or die "Cannot open $dest : $!";

lock($fh);
print $fh $message->toString();
unlock($fh);
close $fh;       

The first print works fine

Input: {"paramkey":"message","paramvalue":"здравейте"}

is being printed to the console. The problem is when I write to the file:

Input: {"paramkey":"message","paramvalue":"здÑавейÑе"}

I used flock for locking/unlocking the file.


Solution

  • The contents of the string returned by your toString method are already UTF-8 encoded. That works fine when you print it to your terminal because it is expecting UTF-8 data. But when you open your output file with

    open (my $fh, '>>:encoding(UTF-8)', $dest) or die "Cannot open $dest : $!"
    

    you are asking that Perl should reencode the data as UTF-8. That converts each byte of the UTF-8-encoded data to a separate UTF-8 sequence, which isn't what you want at all. Unfortunately you don't show your code for the class that $message belongs to, so I can't help you with this

    You can fix that by changing your open call to just

    open (my $fh, '>>', $dest) or die "Cannot open $dest : $!"
    

    which will avoid the additional encoding step. But you should really be working with unencoded characters throughout your Perl code: removing any encoding from files you are reading from, and encoding output data as necessary when you write to output files.