Search code examples
phpregexhtml-parsingpreg-match

preg_match not matching subpattern with html tag


I have a regex:

$reg = '/<a class="title".*>(.*)<\/a>/';

and the following text:

$text = '<h3 class="carousel-post-title"><a class="title" href="/first-link/">Some text<br /><span class="title-highlight">with a span</span></a></h3>'

which I pass to preg_match:

$matches = [];
preg_match($reg, $text, $matches);

This returns

Array (
    [0] => <a class="title" href="/first-link/">Some text<br /><span class="title-highlight">with a span</span></a>
    [1] => 
)

whereas

$text2 = '<h3 class="carousel-post-title"><a class="title" href="/second-link/">Some text here</a></h3>';

preg_match($reg, $text2, $matches);

returns

Array
(
    [0] => <a class="title" href="/second-link/">Some text here</a>
    [1] => Some text here
)

Why is that? Why does the subpattern "(.*)" not match 'with a span'?


Solution

  • Change your pattern to

    $reg = '/<a class="title"[^>]*>([^<]*)<\/a>/';
    

    So that it knows you want anything unless it's < in the first part or > in the second part.

    <a class="title"[^>]*> //Get the opening tag
    ([^<]*) //match anything until you reach a closing tag
    <\/a> // your closing tag