Search code examples
ruby-on-railsrubyxmlnokogirisax

Nokogiri Gem wont parse the file using SAX handler


I have xml file with header

<?xml version="1.0" encoding="utf-16"?>

and also it contains the

<transmission xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">

when used the SAX parser it wont parse. But when manually removed the encoding part and the attributes after transmission;XML parsing success. Being the file is large;I can use only SAX.Is there any other way to parse this xml file without manually removing the encoding and transmission attributes.

Sample Code is

      require 'nokogiri'
        include Nokogiri



class P < Nokogiri::XML::SAX::Document

      def initialize
      end

      def start_element(element, attributes = [])
        puts element
      end

      def cdata_block(string)
      end

      def characters(string)
      end

      def end_element(element)
        puts element
      end
 end

    parser = Nokogiri::XML::SAX::Parser.new(P.new())
    parser.parse_file('file_dummy.xml')

Solution

  • After numerous referrals. I got the answer. It is the answer from @thetinman.But not fully absorbed. Used a sed command to replace utf-16 with utf-8 and parse the file. Why i need the sed operation is nokogiri causes issue with this utf-16