Search code examples
perlutf-8iso-8859-1

Perl - Correcting char encoding on command line input


I am writing a program to fix mangled encoding, specifically latin1(iso-8859-1) to greek (iso-8859-7).

I created a function that works as intended; a variable with badly encoded text is converted properly.

When I try to convert $ARGV[0] with this function it doesn't seem to correctly interpret the input.

Here is a test program to demonstrate the issue:

#!/usr/bin/env perl

use 5.018;
use utf8;
use strict;
use open qw(:std :encoding(utf-8));
use Encode qw(encode decode);

sub unmangle {
 my $input = shift;

 print $input . "\n";
 print decode('iso-8859-7', encode('latin1',$input)) . "\n";
}


my $test = "ÁöéÝñùìá";  # should be Αφιέρωμα

say "fix variable:";
unmangle($test);

say "\nfix argument:";
unmangle($ARGV[0]);

When I run this program with the same input as my $test variable the reults are not the same (as I expected that they should be):

$ ./fix_bad_encoding.pl "ÁöéÝñùìá"
fix variable:
ÁöéÝñùìá
Αφιέρωμα

fix stdin:
ÃöéÃñùìá
ΓΓΆΓ©ΓñùìÑ

How do I get $ARGV[0] to behave the way the $test variable does?


Solution

  • You decoded the source. You decoded STDIN (which you don't use), STDOUT and STDERR. But not @ARGV.

    $_ = decode("UTF-8", $_) for @ARGV;