Search code examples
ruby-on-railsrubysoapnokogirisavon

Running out of memory parsing XML SOAP response in Rails with Savon and Nokogiri


I have a rails 4 webapp that is consuming a SOAP webservice endpoint. For each company, sends a request to get a list of resources (it doesn´t matter what kind, just information).

The method sends the request with Savon 2, gets the response and parse it with Nokogiri to handle the XML resources with xpath.

The loop is working great until it tries to get a specific company with a very large amount of resources, much more than the others. Then, the problems come. I have monitorised with 'top' in ubuntu that when the process starts to process the response the process consume the RAM memory until it kills the rails app. Then the memory is released but the webapp got down.

Please find a sample code inside a method:

# Initializing Savon client
client = Savon.client(wsdl: endpoint, 
                      log_level: :info,
                      log: true,
                      pretty_print_xml: true,
                      open_timeout: 300, 
                      read_timeout: 300)
for company in companies do
  message = {'in0' => USER_ID, 
             'in1' => USERNAME, 
             'in2' => MMK_PASSWORD,
             'in3' => company.id}
  @logger.debug "getResources=1"
  response = client.call(:get_resources, message: message) 
  @logger.debug "getResources=2"               
  resourcesXML = response.to_hash[:get_resources_response][:out]
  @logger.debug "getResources=3"              
  resourcesParsed = Nokogiri::XML(resourcesXML)
  @logger.info "getResources=4"
  resources = resourcesParsed.xpath("//resource")
  @logger.info "getResources=5"

The logs show up to "getResources=3". Then the webapp crushes.

What do you think is the best approach? 1. Is there a better way to process this information avoiding killing the app. 2. Maybe is there a way to process the response partially? 3. Is there a better performance tools for this scenario? 4. None of the above is possible and I just can increase the RAM of my system? I have an Amazon AWS instance with 4GB.


Solution

  • I just would like to explain how I solved it and my insights. Probably the best approach when parsing big XML files is to use a SAX parser, which is the comment suggested by @dbugger. It doesn´t load the whole XML in memory and thats the reason why it solves the problem. However, in my case there are two inconveniences. First, is performance is critical to us and SAX parsers are slower that DOM parsers. The second is that we already have all code with DOM parser and we need to redevelop everything.

    For those reason, my approach is a kind of walkaround. I just split the big XML file in smaller pieces more easily to hand by the DOM parser.

    At the moment, it´s working fine. So, it looks to work. If I find out any issue, I will update here.