I have the following string.
$data = "<meta charset='UTF-8'>
<meta name='keywords' content='your, tags'>
<meta name='description' content='150 words'>
<meta name='subject' content='your website's subject'>
<meta name='copyright' content='company name'>
<meta name='language' content='ES'>
<meta name='robots' content='index,follow'>
<meta name='revised' content='Sunday, July 18th, 2010, 5:15 pm'>
<meta name='abstract' content=''>
<meta name='topic' content=''>
<meta name='summary' content=''>
<meta name='Classification' content='Business'>
<meta name='author' content='name, [email protected]'>
<meta name='designer' content=''>
<meta name='reply-to' content='[email protected]'>
<meta name='owner' content=''>
<meta name='url' content='http://www.websiteaddrress.com'>
<meta name='identifier-URL' content='http://www.websiteaddress.com'>
<meta name='directory' content='submission'>
<meta name='pagename' content='jQuery Tools, Tutorials and Resources - O'Reilly Media'>
<meta name='category' content=''>
<meta name='coverage' content='Worldwide'>
<meta name='distribution' content='Global'>
<meta name='rating' content='General'>
<meta name='revisit-after' content='7 days'>
<meta name='subtitle' content='This is my subtitle'>
<meta name='target' content='all'>
<meta name='HandheldFriendly' content='True'>
<meta name='MobileOptimized' content='320'>
<meta name='date' content='Sep. 27, 2010'>
<meta name='search_date' content='2010-09-27'>
<meta name='DC.title' content='Unstoppable Robot Ninja'>
<meta name='ResourceLoaderDynamicStyles' content=''>
<meta name='medium' content='blog'>
<meta name='syndication-source' content='https://mashable.com/2008/12/24/free-brand-monitoring-tools/'>
<meta name='original-source' content='https://mashable.com/2008/12/24/free-brand-monitoring-tools/'>
<meta name='verify-v1' content='dV1r/ZJJdDEI++fKJ6iDEl6o+TMNtSu0kv18ONeqM0I='>
<meta name='y_key' content='1e39c508e0d87750'>
<meta name='pageKey' content='guest-home'>
<meta itemprop='name' content='jQTouch'>
<meta http-equiv='Expires' content='0'>
<meta http-equiv='Pragma' content='no-cache'>
<meta http-equiv='Cache-Control' content='no-cache'>
<meta http-equiv='imagetoolbar' content='no'>
<meta http-equiv='x-dns-prefetch-control' content='off'>";
I want to extract the values for the listed meta tags, including both name meta tags and httpequiv meta tags
This is where I'm at with this:
// explode the string by newline
$parts = explode("\n" ,$data);
// loop through each meta tag line
foreach ($parts as $part) {
// match inside the name attribute and the content attribute
preg_match("/<meta name=\"(.*)\" content=\"(.*)\" \/>/i", $part, $matches);
// returns "</pre><pre>Array()"
print "<pre>" . print_r($matches, true) . "</pre>";
}
I think there's something wrong with my regular expression.
The atttributes using single quotes, not double quotes. The closing tag is not />
but >
without space:
preg_match("/<meta name='([^']*)' content='([^']*)'\s?\/?>/i", $part, $matches);
Explanation :
[^']* # get all data until ' is reached
\s? # with whitespace character (\s), or not (?)
\/? # with slash (/) or not (?)
Here is a version that use also double quotes, and multiple spaces:
"/<meta\s*name=['\"]([^']*)['\"]\s*content=['\"]([^']*)['\"]\s?\/?>/i"
-> online demo
But, it is always better to use a DOM parser to check HTML elements.