Hoping this is possible with Simple Html Dom, I'm scraping a page that looks like this:
<h5>this is title 1</h5>
<img>
<img>
<img>
<h5>this is title 2</h5>
<img>
<img>
<h5>this is title 3</h5>
<img>
<img>
<img>
<img>
etc...
I'm trying to get it to look something like:
<h5>this is title 1</h5>
<img>
<h5>this is title 1</h5>
<img>
<h5>this is title 1</h5>
<img>
<h5>this is title 2</h5>
<img>
<h5>this is title 2</h5>
<img>
Which means for each IMG I need to find and grab the first previous H5, I think. There's no parent divs or any structure to make it any easier, it's pretty much how I described it.
The code I'm using looks something like this (simplified):
foreach($html->find('img') as $image){
//do stuff to the img
$title = $html->find('h5')->prev_sibling();
echo $title; echo $image;}
Everything I've tried with prev_sibling gets me a "Fatal error: Call to a member function prev_sibling() on a non-object" and I'm wondering if what I'm trying to do is even possible with PHP Simple HTML Dom. I hope so, all the other scrapers I've tried were making me pull my hair out.
Essentially, you want to select all h5
elements, as well as all the img
elements. Then, you loop through them, and check their type. If it's an h5
element, you update your $title
variable but don't echo
anything. If it's an img
, you simply echo the $title
before the image. No need to go hunting for the h5
now since you've already cached it.
Here's an example:
foreach ( $html->find('h5, img') as $el )
{
if ( $el->tag == 'h5' )
{
$title = $el->plaintext;
continue;
}
echo "<h5>$title</h5>";
echo $el->outertext;
}