Search code examples
phpxpathcss-selectorsgoutte

Goutte Selectors on Markup that may or may not be present


I'm sure this is simple but I'm struggling to get it right. I have the following markup:

<div id="container">
   <h3>Instructions</h3>
   <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>
   <h3>Directions</h3>
   <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>
   <h3>Warnings</h3>
   <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>
</div>

Any of the three elements might be missing and they can be in any order, I want to be able to extract the text in the p tags using goutte and know which one I'm dealing with.

I've tried variations of the following without success:

$node->filter('div#container h3')->each(function (Crawler $node) {
   switch ($node->text() {
      case 'Instructions':
         //$instructions = $node->filter('p')->text();
         //$instructions = $node->closest('p')->text();
         $instructions = $node->parents()->filter('p')->text()
      break;
    //etc....
   }
});

I've also tried using xpath to get preceding-siblings but can't get it right trying things along the lines of

$node->filterXPath("/div[preceding-sibling::h3[normalize-space() = 'Instructions']]");

Solution

  • It doesn't seem like Crawler has a way of traversing to the next immediate sibling of an element so you may need to use XPath. Use the following-sibling:: axis with a [position() = 1] predicate to limit it to just the very next p that comes after the h3 you want:

    $node->filterXPath("/div/h3[normalize-space() = 'Instructions']/following-sibling::p[position() = 1]");