Search code examples
regexperlcdata

Perl regular expression to replace exact number within CDATA string


I have a Perl script that needs to be able to replace values contained within CDATA tags in XML. I have the following issue:

my $str = "<![CDATA[Replace 00 and 00 but don't replace 1001100.]]>";
my $source = "00";
my $target = "989898";

$str =~ s/(<!\[(?i)CDATA(?-i)\[.*)$source(.*\].*)/$1$target$2/g;

The output that I am looking for is:

<![CDATA[Replace 989898 and 989898 but don't replace 1001100.]]>

What I am getting is:

<![CDATA[Replace 00 and 00 but do not replace 10011989898.]]>

I would also need to be able to replace $source if $str were to equal the following:

$str = "<![CDATA[HEREISSOMETEXT00]]>";

Desired output would be:

<![CDATA[HEREISSOMETEXT989898]]>

I would also need to make some changes to paths as follows:

my $str = "<![CDATA[/this/is/my/CHANGE_ME/path]]>";
my $source = "CHANGE_ME";
my $target = "NEW_ME";

Desired output would be:

<![CDATA[/this/is/my/NEW_ME/path]]>

But also need the following functionality:

my $str = "<![CDATA[/this/is/my/DONOTCHANGE_ME/path]]>";
my $source = "CHANGE_ME";
my $target = "NEW_ME";

Desired output:

<![CDATA[/this/is/my/DONOTCHANGE_ME/path]]>

Basically, I need exact matches within substrings and I cannot use any of the Perl libraries that are not delivered with Perl "out of the box."

I also had written this much simpler regex:

$str =~ s/$source/$target/g if $_ =~ m/<!\[CDATA/i;

This works great whenever I need to just replace a string like "ABC" or even "AB0" but this wreaks havoc if I need to change "00" to "10" since it replaces both "00" to "10" (desired) and "1000" to "1100" (not desired).

Any help would be greatly appreciated! Thanks...


Solution

  • If you want to only replace whole words, use the word boundaries \b:

    s/\b00\b/10/;
    

    Or, if you want to replace only when no digits precede or follow the string, use look-around assertions:

    s/ (?<![0-9]) 00 (?![0-9]) /10/x;