I have a regular expression that looks into HTML files and finds elements based off if they match a regular expression I pass in. One main thing that determines if the element is matched is what the id is. The id has to be some special characters that I decided to use. The problem is, I know that some people will use either double OR single quotes when writing ids in HTML. I want to be able to catch either case. So my regular expression is this:
preg_match('@(<)([^\s]*).*(id)\s*=\s*["|\']{{ALViewElement_'.$viewElement.'}}["|\'][^/]*?(>)@i', $viewFile, $elementMatches, PREG_OFFSET_CAPTURE)
Close to the middle you'll see where I have id
. After the equals sign I have ["|\']
and then at the end I have the same thing for the closing quote.
If my html looks like this, I get a match:
<section id="{{ALViewElement_resume}}" data-test="testing" >
<!--{{RESUME_ADD_CHANGE_PIECE}}-->
<!--{{RESUME}}-->
</section>
However, if I use single quotes instead, it doesn't match:
<section id='{{ALViewElement_resume}}' data-test="testing" >
<!--{{RESUME_ADD_CHANGE_PIECE}}-->
<!--{{RESUME}}-->
</section>
I can't seem to figure out what's wrong with my regular expression that it won't pick up the single quotes. Any ideas?
I give you a general answer, then you can adapt it to your case. To match single or double quotes, the tricks are:
1) use a backreference (\1
refer to the capture group 1):
the simple method with a lazy quantifier:
(["']).*?\1
the optimised and safer method (deals with escaped quotes):
(["'])(?>[^"']|["'](?<!\1)|(?<=\\)\1)*+\1
2) use an alternation:
(?>"....."|'.....')
interesting variant with capture groups:
(?|"...(###)..."|'...(###)...')
and the two capture groups have the same number.
Notice: the problem doesn't come from your pattern.