Search code examples
phpregexpreg-match-alltext-extraction

Extract text from multiple HTML tags


I am trying to make this work but I can't:

$str = "<html>1</html><html>2</html><html>3</html>";
$matches = array(); 
preg_match_all("(?<=<html>)(.*?)(?=</html>)", $str, $matches);

foreach ($matches as $d)
{
   echo $d;
}

What I am doing wrong? The output must be:

123

Solution

  • This should work for you:

    $str = "<html>1</html><html>2</html><html>3</html>";
    preg_match_all("~(?<=<html>).*?(?=</html>)~", $str, $matches);
    
    foreach ($matches[0] as $d) {
       echo $d;
    }
    

    Output:

    123
    

    Changes are:

    • Use missing regex delimiters ~ in preg_match_all function pattern
    • Remove capturing group since you are already using lookahead and lookbehind so entire match can be used in further processing
    • Using $matches[0] in foreach loop instead of $matches
    • There is no need to declare/initialize $matches before preg_match_all call