Search code examples
phphtmlregexoptimizationlookbehind

Regex has problems with poitive lookbehind


I am currently trying to create a regex that strips unecessary quotation marks from HTML tags. The regex will be used in PHP code.

<input type="image" src="/flags/en.png" alt="English" title="English" name="en" class="screen selected" />

converts to

<input type=image src="/flags/en.png" alt=English title=English name=en class="screen selected" />

I have come up with this regex and replacement:

/(?<=<(?:[^>]+?\s)?)([\w-]+=)"([\w-]+)"(?=(?:\s[^>]+)?>)/g
$1$2

The problem is that the positive lookbehind does not allow quantifiers (See http://regex101.com/ as a reference.).

So I thought I modify the pattern a little bit like this:

/(<(?:[^>]+?\s)?)([\w-]+=)"([\w-]+)"((?:\s[^>]+)?>)/g
$1$2$3$4

Now it's valid but it only strips one set of quotes from each tag.

How do I acomplish this?


Solution

  • Try the following:

    $pattern = '/(<(?:[^>]+?\s)?)([\w-]+=)"([\w-]+)"((?:\s[^>]+)?>)/';
    $replacement = '$1$2$3$4';
    $subject = '<input type="image" src="/flags/en.png" alt="English" title="English" name="en" class="screen selected" />';
    
    while(preg_match($pattern, $subject)){
        $subject = preg_replace($pattern, $replacement, $subject);
    }
    var_dump($subject);