I'm trying to match a Unicode string using Perl regex. The string seems to arrive at my module unscathed with proper encoding if I output it to STDOUT: "Asuncion, Distrito Capital de Paraguay, Región Oriental, Paraguay."
However, it won't match in Regex. Oddly, if I copy the output of the script into a variable and evaluate that, that does match in the same Regex:
use v5.12;
use utf8;
my $placeString = $main::FORM{'placeString'}; # Coming from a different module.
say STDOUT $placeString;
utf8::upgrade($placeString); # Using this or removing this seems to make no difference.
# If I manually copy the output of STDOUT (above) in BASH and set the string, it works:
$placeString2 = "Asuncion, Distrito Capital de Paraguay, Región Oriental, Paraguay";
if ($placeString =~ /^([\w\s\,\.\-\']+)$/) {
# This is evaluated as false.
say STDERR "Accepted placename.";
}
if ($placeString2 =~ /^([\w\s\,\.\-\']+)$/) {
# This is evaluated as true.
say STDERR "Accepted placename.";
}
However, it won't match in Regex.
From the comments it becomes clear that the value is a UTF-8 encoded string. You need to decode the value before doing the match:
use Encode qw(decode_utf8);
$placeString = decode_utf8($placeString);