I want to write approximately 50MB of data to an XML file.
I found Nokogiri (1.5.0) to be efficient for parsing when just reading and not writing. Nokogiri is not a good option to write to an XML file since it holds the complete XML data in memory until it finally writes it.
I found Builder (3.0.0) to be a good option but I'm not sure if it's the best option.
I tried some benchmarks with the following simple code:
(1..500000).each do |k|
xml.products {
xml.widget {
xml.id_ k
xml.name "Awesome widget"
}
}
end
Nokogiri takes about 143 seconds and also memory consumption gradually increased and ended at about 700 MB.
Builder took about 123 seconds and memory consumption was stable enough at 10 MB.
So is there a better solution to write huge XML files (50 MB) in Ruby?
Here's the code using Nokogiri:
require 'rubygems'
require 'nokogiri'
a = Time.now
builder = Nokogiri::XML::Builder.new do |xml|
xml.root {
(1..500000).each do |k|
xml.products {
xml.widget {
xml.id_ k
xml.name "Awesome widget"
}
}
end
}
end
o = File.new("test_noko.xml", "w")
o.write(builder.to_xml)
o.close
puts (Time.now-a).to_s
Here's the code using Builder:
require 'rubygems'
require 'builder'
a = Time.now
File.open("test.xml", 'w') {|f|
xml = Builder::XmlMarkup.new(:target => f, :indent => 1)
(1..500000).each do |k|
xml.products {
xml.widget {
xml.id_ k
xml.name "Awesome widget"
}
}
end
}
puts (Time.now-a).to_s
Solution 1
If speed is your main concern, I'd just use libxml-ruby directly:
$ time ruby test.rb
real 0m7.352s
user 0m5.867s
sys 0m0.921s
The API is pretty straight forward:
require 'rubygems'
require 'xml'
doc = XML::Document.new()
doc.root = XML::Node.new('root_node')
root = doc.root
500000.times do |k|
root << elem1 = XML::Node.new('products')
elem1 << elem2 = XML::Node.new('widget')
elem2['id'] = k.to_s
elem2['name'] = 'Awesome widget'
end
doc.save('foo.xml', :indent => false, :encoding => XML::Encoding::UTF_8)
Using :indent => true
doesn't make much difference in this case, but for more complex XML files it might.
$ time ruby test.rb #(with indent)
real 0m7.395s
user 0m6.050s
sys 0m0.847s
Solution 2
Of course the fastest solution, and that doesn't build up on memory is just to write the XML manually but that will easily generate other sources of error like possibly invalid XML:
$ time ruby test.rb
real 0m1.131s
user 0m0.873s
sys 0m0.126s
Here's the code:
f = File.open("foo.xml", "w")
f.puts('<doc>')
500000.times do |k|
f.puts "<product><widget id=\"#{k}\" name=\"Awesome widget\" /></product>"
end
f.puts('</doc>')
f.close