How can I get all elements following once, like :
<div id="exemple">
<h2 class="target">foo</h2>
<p>bla bla</p>
<ul>
<li>bar1</li>
<li>bar2</li>
<li>bar3</li>
</ul>
<h4>baz</h4>
<ul>
<li>lot</li>
</ul>
<div>of</div>
<p>possible</p>
<p>tags</p>
<a href="#">after</a>
</div>
I need to detect <h2 class="target">
and get all tags to the next <h4>
and ignore <h4>
AND all followings tags (if <h4>
not exist, I have to get all tags to the end of parent [here : end of <div>
])
The content is dynamic and unpredictable The only rule is : we know there is a target and there is a (or end of element). I need to get all tags beetween both and exclud all others.
With this exemple I need to get the HTML following :
<h2 class="target">foo</h2>
<p>bla bla</p>
<ul>
<li>bar1</li>
<li>bar2</li>
<li>bar3</li>
</ul>
so I can get : target = page.at('#exemple .target')
I know next_sibling
method, but how can i test the type of tag of the current node?
I think about something like that to course the node tree :
html = ''
while not target.is_a? 'h4'
html << target.inner_html
target = target.next_sibling
How can I do this?
You can subtract the ones you don't want from your nodeset:
h2 = page.at('h2')
(h2.search('~ *') - h2.search('~ h4','~ h4 ~ *')).each do |el|
# el is not a h4 and does not follow a h4
end
Maybe it makes more sense to use xpath but I can do this without googling.
Your idea of iterating next sibling can work too:
el = page.at('h2 ~ *')
while el && el.name != 'h4'
# do something with el
el = el.at('+ *')
end