Search code examples
rubynokogirisax

How do I traverse an inner node using SAX in Nokogiri?


I'm quite new to Nokogiri and Ruby and seeking a little help.

I am parsing a very large XML file using class MyDoc < Nokogiri::XML::SAX::Document. Now I want to traverse the inner part of a block.

Here's the format of my XML file:

<Content id="83087">
    <Title></Title>
    <PublisherEntity id="1067">eBooksLib</PublisherEntity>
    <Publisher>eBooksLib</Publisher>
    ......
</Content>

I can already tell if the "Content" tag is found, now I want to know how to traverse inside of it. Here's my shortened code:

class MyDoc < Nokogiri::XML::SAX::Document
  #check the start element. set flag for each element
  def start_element name, attrs = []
    if(name == 'Content')
      #get the <Title>
      #get the <PublisherEntity>
      #get the Publisher
    end
  end


  def cdata_block(string)
    characters(string)
  end 

  def characters(str)
    puts str
  end
end

Solution

  • It's trickier to do with SAX. I think the solution will need to look something like this:

    class MyDoc < Nokogiri::XML::SAX::Document
      def start_element name, attrs = []
        @inside_content = true if name == 'Content'
        @current_element = name
      end
    
      def end_element name
        @inside_content = false if name == 'Content'
        @current_element = nil
      end
    
      def characters str
        puts "#{@current_element} - #{str}" if @inside_content && %w{Title PublisherEntity Publisher}.include?(@current_element)
      end
    end