Search code examples
rubyxmlperformancenokogirinode-set

XPath performance with merged Nokogiri::XML::NodeSets?


I am getting data from a web service, 100 <row> per page. My script joins these pages to a Nokogiri::XML::Nodeset. Searching the nodeset via XPath is extremely slow.

This code replaces the web service call and XML parsing, but the symptom is the same:

rows = []
(1..500).to_a.each_slice(100) { |slice|
  rows << Nokogiri::XML::Builder.new { |xml|
    xml.root {
      xml.rows {
        slice.each { |num|
          xml.row {
            xml.NUMBER {
              xml.text num
            }
          }
        }
      }
    }
  }.doc.at('/root/rows')
}

rows = rows.map { |a| a.children }.inject(:+)

The resulting NodeSet contains nodes from five documents. This seems to be a problem:

rows.map { |r| r.document.object_id }.uniq
  => [21430080, 21732480, 21901100, 38743080, 40472240]

The Problem: The following code runs in about ten seconds. With a non-merged nodeset this is done within a blink of an eye:

(1..500).to_a.sample(100).each do |sample|
  rows.at('//row[./NUMBER="%d"]' % sample)
end

Does somebody have a solution to merge the Nodesets a better way or merge the documents?

I would like to keep the behaviour of only one nodeset, as this data is practically one big nodeset, which was split by the web service for technical reasons.


Solution

  • The key to merge the Nodesets is to detach the nodes with Node#remove and add them to the other nodeset:

    nodeset = nil
    rows.each do |slice|
      if nodeset.nil?
        nodeset = slice
      else
        slice.children.each do |row|
          nodeset.add_child(row.remove)
        end
      end
    end