Search code examples
rubyxpathnokogiriscraper

Nokogiri Xpath Double Looping


What I'm trying to do is pul the code block that contains the td with the class default. This works perfectly fine. But then I need to sort out the different parts of the code block. When I try to do this with the second xpath call what it does is each time it prints all the comheads in each of the blocks

   def HeaderProcessor(doc)         
        doc.xpath("//td[@class='default']").each do |block|             
            puts block.xpath("//span[@class='comhead']").text
        end
    end

When I just print out block each block prints out once and contains the comment header and the comment. When I try to run the xpath it prints out EVERY comhead found in doc and seems to be ignoring the block variable.

Any ideas on how I can make this work? What am I miss understanding about xpath?

UPDATE:

<td class="default">
<div style="margin-top:2px; margin-bottom:-10px; ">
<span class="comhead">
#some data        
</span></div>
<br><span class="comment"><font color="#000000">#some more data</span>
</td>

Solution

  • You're telling Nokogiri to search from the root when you say //span[@class='comhead'], you just want */span[@class='comhead']:

    doc.xpath("//td[@class='default']").each do |block|
        block.xpath("*/span[@class='comhead']").each do |span|
            puts span.text
        end
    end
    

    or even just this:

    doc.xpath('//td[@class="default"]/*/span[@class="comhead"]').each do |span|
        puts span.text
    end
    

    if you don't need to do anything with the <td> elements.