Search code examples
regexpreg-match

Getting webpage title regex


I am not good with regex;

I am trying to read webpage titles.I have encountered some pages with structures like <title itemprop="name">test - Google+</title> OR <title id="name">Safaricom - Google+</title>

When i try reading them with the below code i get untitled,how can i fix this.

$header_data = Array(); 
if (preg_match("@<title *>(.*?)<\/title*>@si", $file, $header_data)) {
        $title = trim($header_data[1]);
}

Solution

  • The problem here is the use of *

    • indicates that the preceding regex/character can be present 0 or many times

    so this is trying to match a space many times and will ONLY allow for spaces after title and between >

    try

    <title.*>(.*?)<\/title>