I am trying to scrape a website to get some practice with QueryPath.
Here is what I have so far, and gives me an error:
Creating default object from empty value
// URL to scrape
$baseurl = 'http://some-site-with-a-table-of-items-that-contain-links.com';
// Get all rows from table
$rows = htmlqp($baseurl, '#items_table')->find('tr');
//initialize items array
$items = array();
// initilize counter
$i = 0;
// Iterate through rows of items
foreach($rows as $row) {
// get the url for the item in this row
$url = qp($row)->find('.link_txt a')->attr('href');
// select all the info in the item detail box
$item = htmlqp($url)->find('.item_detail_box');
// assign the item attributes to an array
$items[$i] = [
// the qp item $row is from the info on the main table of items
'img_thumb' => qp($row)->find('.reflection')->attr('src'),
'name' => qp($row)->find('.link_txt a')->text(),
'item_level' => qp($row)->find('.col_center')->text(),
'req_level' => qp($row)->find('.col_right')->text(),
'url' => $url,
// the qp item $item is from the actual item detail page
//'img' => qp($item)->find('.reflection')->attr('src'),
//'is_unique' => qp($item)->find('.unique')->text(),
$data = print_r($items, true);
return '<pre>' . $data . '</pre>';
The error will occur if I uncomment either of the img
or is_unique
array lines.
Everything else works and gives expected output when those lines are commented out.
The problem happened because QueryPath was getting nothing from the selector trying to get text from an anchor tag.
I was trying to get the text from a link/anchor from each table row.
However, the first row in my loop was a table header and not a row with any links.
Adding a check in the loop fixed my issue:
$url_ext = qp($row)->find('.ic_link_txt a')->attr('href');
if ( $url_ext != NULL && $url_ext != "" ) {
Which was a stupid mistake on my part not knowing enough about QueryPath.
(Also related to github issue https://github.com/technosophos/querypath/issues/130 )