Search code examples
rubyxmlsaxopen-uri

open-uri and sax parsing for a giant xml document


I need to connect to an external XML file to download and process (300MB+). Then run through the XML document and save elements in the database.

I am already doing this no problem on a production server with Saxerator to be gentle on memory. It works great. Here is my issue now --

I need to use open-uri (though there could be alternative solutions?) to grab the file to parse through. This problem is that open-uri has to load the whole file before anything starts parsing, which defeats the entire purpose of using a SAX Parser to save on memory... any work arounds? Can I just read from the external XML document? I cannot load the entire file or it crashes my server, and since the document is updated every 30 minutes, I can't just save a copy of it on my server (though this is what I am doing currently to make sure everything id working).

I am doing this Ruby, p.s.


Solution

  • You may want to try Net::HTTP's streaming interface instead of open-URI. This will give Saxerator (via the underlying Nokogiri::SAX::Parser) an IO object rather than the entire file.