Search code examples
rubydommechanizescraper

Using Ruby/Mechanize to select next element after selected element


I was unable to find this question specifically, hopefully I'm not wrong about it being a new variation on an older question.

I'm hoping to be able to select the table after the (inconsistent) p.red element text(), where the 'p' does not contain the text "Alphabetical" but does contain the text "OVERALL" ..

The DOM looks something like this:

<p class=red>Some Text</p>
  <table class="newclass">
  <tr></tr>
  <tr></tr>
</table>

<p class=red>Some Text</p>
<table class="newclass">
  <tr></tr>
  <tr></tr>
</table>

<p class=red>OVERALL</p>
<table class="newclass">
  <tr></tr>
  <tr></tr>
</table>
  • the table comes in at different counts for each page.

I want to get that p tag's text() but also get the table directly after it. Again, where the text() contains "OVERALL" but not "ALPHABETICAL" .. should I build an array and .reject() the elements without matches? I'm not sure at the moment and I'm fairly new to using Ruby and Mechanize, thanks in advance for any help!


Solution

  • Using Nokogiri's CSS evaluation is nice and clean:

    require 'nokogiri'
    
    doc = Nokogiri::HTML(<<EOT)
    <p class=red>Some Text</p>
      <table class="newclass">
      <tr></tr>
      <tr></tr>
    </table>
    
    <p class=red>Some Text</p>
    <table class="newclass">
      <tr></tr>
      <tr></tr>
    </table>
    
    <p class=red>OVERALL</p>
    <table class="newclass">
      <tr></tr>
      <tr></tr>
    </table>
    EOT
    
    puts doc.at('p:contains("OVERALL")').to_html
    # >> <p class="red">OVERALL</p>
    
    puts doc.at('p:contains("OVERALL") ~ table').to_html
    # >> <table class="newclass">
    # >> <tr></tr>
    # >> <tr></tr>
    # >> </table>