I'm writing complex configuration files for a data transformation tool with Perl 5.20.
The configuration file has placeholders for several aspects at load time and runtime encapsulating some path code like
# Load time placeholder example
CONFIG: NAME ${/path/pos/*/test/*[123] == 'ABC' }
# Runtime placeholder example
COLUMN: CSV_NAME STRING DEFAULT :{./CSV_FIRST}
For some reason it should work also on multi line expressions.
I've written scanner using Text::ParseWord
, the standard separator \s+
and want to escape the placeholder expressions before splitting the data line(s) into single words by base64
encoded expressions not containing \s+
. The expressions are also the keys for the subsequent data substitution.
The escaping is driven by a pattern match defined by (...see code below):
my @pat = $line =~ /([^\\]\Q$pfx\E\{[^\Q$pfx\E\{\}]+\})/gs;
Which IMO defines a multi line pattern ${...}
when I use $pfx = '$'
for example, but masks (escapes) \${...}
expressions.
I'm struggling a while with the inner part of the pattern ...
and get [^\Q$pfx\E\{\}]+
to work, but have the feeling that is not the correct one, because
What is the right expression to do this?
#!/usr/bin/env perl
use strict;
use warnings;
use MIME::Base64;
use feature qw(signatures);
no warnings 'once';
no warnings 'experimental';
no warnings 'experimental::signatures';
my $line =
'# test data
This are:
1. ${/multiline/used/*[3]
= "12345"}
2. ${/single/line/compile/time/pattern/*[3]}
3. ${/single/line/runtime/pattern/x == 1234}
4. ${/multi/line/runtime/pattern \
defer/1 \
defer/2 \
defer/3
}
5. ${//PG.GRM/*[
key eq "TEST.VAR"
]}
';
sub testSpacedPlaceHolder($pfx, $line) {
my %match;
my @pat = $line =~ /([^\\]\Q$pfx\E\{[^\Q$pfx\E\{\}]+\})/gs;
my %seen = ();
my @uniq = grep { ! $seen{$_} ++ } @pat;
for my $key (@uniq) {
my $hkey=$pfx.encode_base64($key);
$hkey =~ s/\n//g;
my $var = substr($key, 3, -1);
$match{$hkey}= [ $var, $key ];
$line =~ s/\Q$key\E/$hkey/g;
}
# Test the output ------------------------------
my $cnt = 0;
print "\nRESULT:\n";
for my $key (sort keys %match) {
$cnt++;
my ($var, $orig) = @ { $match{$key} };
print "---- $cnt ----\n";
print "ORG: $orig\n";
print "VAR: $var\n";
print "ESC: $key\n";
}
print "\nLINE:\n$line\n";
return ($line, \%match);
}
testSpacedPlaceHolder('$', $line);
/usr/bin/env perl "test-strings.pl"
RESULT:
---- 1 ----
ORG: ${/multi/line/runtime/pattern \
defer/1 \
defer/2 \
defer/3
}
VAR: /multi/line/runtime/pattern \
defer/1 \
defer/2 \
defer/3
ESC: $ICR7L211bHRpL2xpbmUvcnVudGltZS9wYXR0ZXJuIFwKICAgICAgZGVmZXIvMSBcCiAgICAgIGRlZmVyLzIgXAogICAgICBkZWZlci8zCiAgICAgfQ==
---- 2 ----
ORG: ${/multiline/used/*[3]
= "12345"}
VAR: /multiline/used/*[3]
= "12345"
ESC: $ICR7L211bHRpbGluZS91c2VkLypbM10KICAgICAgPSAiMTIzNDUifQ==
---- 3 ----
ORG: ${/single/line/compile/time/pattern/*[3]}
VAR: /single/line/compile/time/pattern/*[3]
ESC: $ICR7L3NpbmdsZS9saW5lL2NvbXBpbGUvdGltZS9wYXR0ZXJuLypbM119
---- 4 ----
ORG: ${/single/line/runtime/pattern/x == 1234}
VAR: /single/line/runtime/pattern/x == 1234
ESC: $ICR7L3NpbmdsZS9saW5lL3J1bnRpbWUvcGF0dGVybi94ID09IDEyMzR9
---- 5 ----
ORG: ${//PG.GRM/*[
key eq "TEST.VAR"
]}
VAR: //PG.GRM/*[
key eq "TEST.VAR"
]
ESC: $ICR7Ly9QRy5HUk0vKlsKICAgICAgICBrZXkgZXEgIlRFU1QuVkFSIgogICAgICAgXX0=
LINE:
# test data
This are:
1.$ICR7L211bHRpbGluZS91c2VkLypbM10KICAgICAgPSAiMTIzNDUifQ==
2.$ICR7L3NpbmdsZS9saW5lL2NvbXBpbGUvdGltZS9wYXR0ZXJuLypbM119
3.$ICR7L3NpbmdsZS9saW5lL3J1bnRpbWUvcGF0dGVybi94ID09IDEyMzR9
4.$ICR7L211bHRpL2xpbmUvcnVudGltZS9wYXR0ZXJuIFwKICAgICAgZGVmZXIvMSBcCiAgICAgIGRlZmVyLzIgXAogICAgICBkZWZlci8zCiAgICAgfQ==
5.$ICR7Ly9QRy5HUk0vKlsKICAgICAgICBrZXkgZXEgIlRFU1QuVkFSIgogICAgICAgXX0=
Assuming I have a script that defines some kind of configuration:
MAGIC: MAGIC.TYPE
CONTAINER: NAME BEGIN
DEFINE: VAR1 'USER.NAME'
DEFINE: VAR2 '65789'
INTERNAL.CONTAINER: INTERNAL.NAME BEGIN
TAG1: 'ABCDEF'
TAG2: ${/NAME/VAR1}
# Unwanted nested variant
TAG3: ${/NAME/VAR1 ${/NAME/VAR2} }
# Valid runtime interpolation variant
TAG4: "${/NAME/VAR1}/:{NAME.KEY}"
# Valid runtime path variant but ignored
TAG5: ${/NAME/VAR1/*/:{TEST{KEY}}
END.INTERNAL.NAME
END.NAME
I want to avoid the nested line
# Nested variant
TAG3: ${/NAME/VAR1 ${/NAME/VAR2}}
for variable resolve reasons, but keep
# Valid runtime path variant but ignored
TAG5: ${/NAME/VAR2/*/:{TEST{KEY}}
because they are runtime driven.
My variant blocks TAG5 due to the simple sequence [\$\{\}]+
.
Here is an example of how you can use a recursive regex to exclude nested versions of ${...}
:
use feature qw(say);
use strict;
use warnings;
use Data::Dumper qw(Dumper);
my $str = <<'END_STR';
MAGIC: MAGIC.TYPE
CONTAINER: NAME BEGIN
DEFINE: VAR1 'USER.NAME'
DEFINE: VAR2 '65789'
INTERNAL.CONTAINER: INTERNAL.NAME BEGIN
TAG1: 'ABCDEF'
TAG2: ${/NAME/VAR1}
# Unwanted nested variant
TAG3: ${/NAME/VAR1 ${/NAME/VAR2} }
# Valid runtime interpolation variant
TAG4: "${/NAME/VAR1}/:{NAME.KEY}"
# Valid runtime path variant but ignored
TAG5: ${/NAME/VAR1/*/:{TEST{KEY}}}
END.INTERNAL.NAME
END.NAME
END_STR
my @matches;
while ($str =~ /(?:^|(?<!\\))
(?<G3>(?<G1> \$ \{ (?:
(?>(?:[^{}\$\\] | (?:\\.) |
(?<G2> \{ (?: (?>[^{}]+) | (?&G2))* \} ))
| (?<G4>(?&G1))))* \}))/msxg) {
next if defined $+{G4}; # skip nested matches
push @matches, $+{G3};
}
print Dumper(\@matches);
Output:
$VAR1 = [
'${/NAME/VAR1}',
'${/NAME/VAR1}',
'${/NAME/VAR1/*/:{TEST{KEY}}}'
];
Notice that the results include TAG2, TAG4, and TAG5, but not TAG3.