I am not good with regex;
I am trying to read webpage titles.I have encountered some pages with structures like <title itemprop="name">test - Google+</title>
OR <title id="name">Safaricom - Google+</title>
When i try reading them with the below code i get untitled,how can i fix this.
$header_data = Array();
if (preg_match("@<title *>(.*?)<\/title*>@si", $file, $header_data)) {
$title = trim($header_data[1]);
}
The problem here is the use of *
so this is trying to match a space many times and will ONLY allow for spaces after title and between >
try
<title.*>(.*?)<\/title>