Search code examples
phpregexhtml-parsingtext-extraction

Get text between custom opening HTML tags and its closing tag


$data = "<Data>hello</Data>";
preg_match_all("/\<Data\>[.]+\<\/Data\>/", $data, $match);
print_r($match);

This returns:

Array ( [0] => Array ( ) )

So I am guessing that a match is not made?


Solution

  • preg_match_all("#<Data>.+</Data>#", $data, $match);
    

    If you wanted to use / as the delimiter:

    preg_match_all("/<Data>.+<\/Data>/", $data, $match);
    

    The main problem was that a . inside a character class matches a literal period. Also, using a different delimiter eliminates escaping. Note that you don't have to escape < either way. If you want to be able to extract the inner value, use:

    preg_match_all("#<Data>(.+)</Data>#", $data, $match);
    

    "hello" will now be in $matches[1] in your example. Note that regex is not suited for parsing XML, so switch to a real parser for anything non-trivial.