I am experimenting with PHPQuery (https://code.google.com/p/phpquery/) to scrape data from my website. I want to extract meta information from a page.
Here is what I have tried so far :
$html = phpQuery::newDocumentHTML($file, $charset = 'utf-8');
$MetaItems = [];
foreach (pq('meta') as $keys) {
$names = trim(strtolower(pq($keys)->attr('name')));
if ($names !== null && $names !== '') {
array_push($MetaItems, $names);
}
}
for ($i=0; $i < count($MetaItems); $i++) {
$test = 'meta[name="' . $MetaItems[$i] . '"]';
echo pq($test)->html();
}
Above :
In $MetaItems
I get all the meta attribute name
.This array is filled correctly.
But selecting and extracting text is not working. How do i get the above code to work? Thanks.
You want an assoc array with name => content, correct? Try this:
$metaItems = array();
foreach(pq('meta') as $meta) {
$key = pq($meta)->attr('name');
$value = pq($meta)->attr('content');
$metaItems[$key] = $value;
}
var_dump($metaItems);