Search code examples
regexperl

Working RegEx that fails in Perl find & replace one-liner


I have the following RegEx (<th>Password<\/th>\s*<td>)\w*(<\/td>) which matches <th>Password</th><td>root</td> in this HTML:

<tr>
    <th>Password</th>
    <td>root</td>
</tr>

However this Terminal command fails to find a match:

perl -pi -w -e 's/(<th>Password<\/th>\s*<td>)\w*(<\/td>)/$1NEWPASSWORD$2/g' file.html

It appears to have something to do with the whitespace between the </th> and <td> but the <\/th>\s*<td> works in the RegEx so why not in Perl?

Have tried substituting \s* for \n*, \r*, \t* and various combinations thereof but still no match.

Any help would be gratefully appreciated.


Solution

  • The substitution is only applied to one line of your file at a time.

    You can read the entire file in at once using the -0 option, like this

    perl -w -0777 -pi -e 's/(<th>Password<\/th>\s*<td>)\w*(<\/td>)/$1NEWPASSWORD$2/g' file.html
    

    Note that it is far preferable to use a proper HTML parser, such as HTML::TreeBuilder::XPath, to process data like this, as it is very difficult to account for all possible representations of a given HTML construct using regular expressions.