ruby xpath css-selectors nokogiri mechanize-ruby

Get all tags followings a certain with mechanize ? (ruby)

How can I get all elements following once, like :

<div id="exemple">
  <h2 class="target">foo</h2>
  <p>bla bla</p>
  <ul>
    <li>bar1</li>
    <li>bar2</li>
    <li>bar3</li>
  </ul>
  <h4>baz</h4> 
  <ul>
     <li>lot</li>
  </ul>
  <div>of</div>
  <p>possible</p>
  <p>tags</p>
  <a href="#">after</a>
</div>

I need to detect <h2 class="target"> and get all tags to the next <h4> and ignore <h4> AND all followings tags (if <h4> not exist, I have to get all tags to the end of parent [here : end of <div>])

The content is dynamic and unpredictable The only rule is : we know there is a target and there is a (or end of element). I need to get all tags beetween both and exclud all others.

With this exemple I need to get the HTML following :

<h2 class="target">foo</h2>
<p>bla bla</p>
<ul>
  <li>bar1</li>
  <li>bar2</li>
  <li>bar3</li>
</ul>

so I can get : target = page.at('#exemple .target') I know next_sibling method, but how can i test the type of tag of the current node?

I think about something like that to course the node tree :

html = ''
while not target.is_a? 'h4'
  html << target.inner_html
  target = target.next_sibling

How can I do this?

Solution

You can subtract the ones you don't want from your nodeset:

h2 = page.at('h2')
(h2.search('~ *') - h2.search('~ h4','~ h4 ~ *')).each do |el|
    # el is not a h4 and does not follow a h4
end

Maybe it makes more sense to use xpath but I can do this without googling.

Your idea of iterating next sibling can work too:

el = page.at('h2 ~ *')
while el && el.name != 'h4'
    # do something with el
    el = el.at('+ *')
end