Search code examples
phppreg-matchquotes

PHP preg_match - capturing attributes that use either double or single quotes


I have a regular expression that looks into HTML files and finds elements based off if they match a regular expression I pass in. One main thing that determines if the element is matched is what the id is. The id has to be some special characters that I decided to use. The problem is, I know that some people will use either double OR single quotes when writing ids in HTML. I want to be able to catch either case. So my regular expression is this:

preg_match('@(<)([^\s]*).*(id)\s*=\s*["|\']{{ALViewElement_'.$viewElement.'}}["|\'][^/]*?(>)@i', $viewFile, $elementMatches, PREG_OFFSET_CAPTURE)

Close to the middle you'll see where I have id. After the equals sign I have ["|\'] and then at the end I have the same thing for the closing quote.

If my html looks like this, I get a match:

<section  id="{{ALViewElement_resume}}" data-test="testing" >
            <!--{{RESUME_ADD_CHANGE_PIECE}}-->
            <!--{{RESUME}}-->
        </section>

However, if I use single quotes instead, it doesn't match:

<section  id='{{ALViewElement_resume}}' data-test="testing" >
            <!--{{RESUME_ADD_CHANGE_PIECE}}-->
            <!--{{RESUME}}-->
        </section>

I can't seem to figure out what's wrong with my regular expression that it won't pick up the single quotes. Any ideas?


Solution

  • I give you a general answer, then you can adapt it to your case. To match single or double quotes, the tricks are:

    1) use a backreference (\1 refer to the capture group 1):

    the simple method with a lazy quantifier:

    (["']).*?\1

    the optimised and safer method (deals with escaped quotes):

    (["'])(?>[^"']|["'](?<!\1)|(?<=\\)\1)*+\1

    2) use an alternation:

    (?>"....."|'.....')

    interesting variant with capture groups:

    (?|"...(###)..."|'...(###)...')

    and the two capture groups have the same number.

    Notice: the problem doesn't come from your pattern.