Search code examples
regexpreg-match

Why doesn't (.*?) get everything between html tags in preg_match_all?


I can't seem to understand why I can't get all of the data between two tags after four hours, thing is 3 of them are returned but the 4th isn't (35 drops li).

$ string = '<ul>
    <li>
    <strong>½ cup</strong>&nbsp;white wine </li>
    <li>
    <strong>½ cup</strong>&nbsp;extra virgin olive oil</li>
    <li>
    <strong>35 drops</strong> of water
    </li>
    <li>
    <strong>½ cup</strong>&nbsp;golden flaky raspberries</li>
    </ul>
';

preg_match_all("/<li>\n<strong>(.*?)<\/strong>(.*?)<\/li>/", $string, $matched);

This is the result that I'm getting:

0   =>  array(3
        0   =>  <li>
                <strong>½ cup</strong>&nbsp;white wine vinegar</li>
        1   =>  <li>
                <strong>½ cup</strong>&nbsp;extra virgin olive oil</li>
        2   =>  <li>
                <strong>½ cup</strong>&nbsp;golden raspberries</li>
        )   
1   =>  array(3
        0   =>  ½ cup
        1   =>  ½ cup
        2   =>  ½ cup
        )
2   =>  array(3
        0   =>  &nbsp;white wine vinegar
        1   =>  &nbsp;extra virgin olive oil
        2   =>  &nbsp;golden raspberries
        )
)

All I'm trying to retrieve is everything inside the strong tags, and everything outside of the strong tag like it is in array 1 and 2.

http://www.phpliveregex.com/p/lf8


Solution

  • The closing tag for the 35 drops is on a new line, and your regex is missing that new line:

    <li>\n<strong>(.*?)<\/strong>(.*?)\n?<\/li>
                                      ^^^
    

    Slightly better would be using negated character class (which would match newlines if needed): [^<]

    <li>\n<strong>([^<]*)<\/strong>([^<]*)<\/li>
    

    regex101 demo

    And even better would be to use an html parser.