Search code examples
phpregexpreg-replacecapturing-group

Preg_replace() to add to string using non-capturing group


I have a piece of HTML markup, for which I need to add a specific CSS rule to it. The HTML is like this:

<tr>
<td style="color:#555555;padding-top: 3px;padding-bottom: 20px;">In order to stop receiving similar emails, simply remove the relevant <a href="https://domain.tld/dashboard/" target="_blank">saved search</a> from your account.</td>
</tr>

As you can see td already contains a style tag, so my idea is to match the last ; of it and replace it with a ; plus the rule I need to add...

The problem is that, although I used the appropriate non-capturing group, I still can't figure out how to do this properly... Take a look at this experiment please: https://regex101.com/r/qlVq6A/1

(<td.*style=".*)(;)(".*>)(?:In order to stop receiving)

On the other hand, when I assign a capturing group to the last part (the text in English that's there just to identify which td I'm interested in) it works OK, but I feel like this is an indirect way to make this work... Take a look at this experiment: https://regex101.com/r/qhVatN/1

(<td.*style=".*)(;)(".*>In order to stop receiving)

Can someone explain to me why the first route doesn't work? Basically, why the non-capturing group still captures the text inside of it...


Solution

  • In your second pattern, you use 3 capture groups and you use the style that you want to add in the replacement and the 3rd group contains In order to stop receiving which will be present after using group 3 in the replacement.

    But in your first pattern, you use a non capture group (?: and that will match but is not part of the replacement.

    Note that when using a non capture group like that you can just omit it at all because the grouping by itself like that without for example a quantifier or alternation has no additional purpose.

    You can use a pattern for the example string, but this can be error prone and using a DOM parser would be a better option.

    A way to write the pattern with just 2 capture groups:

    (<td[^>]*\bstyle="[^"]*;)([^"]*">In order to stop receiving)
    

    In the replacement use:

    $1font-size: 80%;$2
    

    Explanation

    • ( Capture group 1
      • <td[^>]* Match <td and then optionally repeat any char except >
      • \bstyle="[^"]*; Match style=" and then optionally repeat matching any char except " and then match the last semicolon (note that it is part of group 1 now)
    • ) Close group 1
    • ( Capture group 2
      • [^"]*">In order to stop receiving Optionally repeat matching any char except : and then match "> followed by the expected text
    • ) Close group 2

    See a regex demo.


    Another option to write the pattern without capture groups making use of \K to forget what is matched so far, and a positive lookahead (?= to assert the expected text to the right:

    <td[^>]*\bstyle="[^"]*;\K(?=[^"]*">In order to stop receiving)
    

    See another regex demo.