Search code examples
ruby-on-railscsvruby-on-rails-5

rails - Exporting a huge CSV file consumes all RAM in production


So my app exports a 11.5 MB CSV file and uses basically all of the RAM that never gets freed.

The data for the CSV is taken from the DB, and in the case mentioned above the whole thing is being exported.

I am using Ruby 2.4.1 standard CSV library in the following fashion:

export_helper.rb:

CSV.open('full_report.csv', 'wb', encoding: UTF-8) do |file|
  data = Model.scope1(param).scope2(param).includes(:model1, :model2)
  data.each do |item|
    file << [
      item.method1,
      item.method2,
      item.methid3
    ]
  end
  # repeat for other models - approx. 5 other similar loops
end

and then in the controller:

generator = ExportHelper::ReportGenerator.new
generator.full_report
respond_to do |format|
  format.csv do
    send_file(
      "#{Rails.root}/full_report.csv",
      filename: 'full_report.csv',
      type: :csv,
      disposition: :attachment
    )
  end
end

After a single request the puma processes load 55% of the whole server's RAM and stay like that until eventually run out of memory completely.

For instance in this article generating a million-lines 75 MB CSV file required only 1 MB of RAM. But there is no DB querying involved.

The server has 1015 MB RAM + 400 MB of swap memory.

So my questions are:

  • What exactly consumes so much memory? Is it the CSV generation or the communication with the DB?
  • Am I doing something wrong and missing a memory leak? Or is it just how the library works?
  • Is there way to free up the memory without restarting puma workers?

Thanks in advance!


Solution

  • Instead of each you should be using find_each, which is specifically for cases like this, because it will instantiate the Models in batches and release them afterwards, whereas each will instantiate all of them at once.

    CSV.open('full_report.csv', 'wb', encoding: UTF-8) do |file|
      Model.scope1(param).find_each do |item|
        file << [
          item.method1
        ]
      end
    end
    

    Furthermore you should stream the CSV instead of writing it to memory or disk before sending it to the browser:

    format.csv do
      headers["Content-Type"] = "text/csv"
      headers["Content-disposition"] = "attachment; filename=\"full_report.csv\""
    
      # streaming_headers
      # nginx doc: Setting this to "no" will allow unbuffered responses suitable for Comet and HTTP streaming applications
      headers['X-Accel-Buffering'] = 'no'
      headers["Cache-Control"] ||= "no-cache"
    
      # Rack::ETag 2.2.x no longer respects 'Cache-Control'
      # https://github.com/rack/rack/commit/0371c69a0850e1b21448df96698e2926359f17fe#diff-1bc61e69628f29acd74010b83f44d041
      headers["Last-Modified"] = Time.current.httpdate
    
      headers.delete("Content-Length")
      response.status = 200
    
      header = ['Method 1', 'Method 2']
      csv_options = { col_sep: ";" }
    
      csv_enumerator = Enumerator.new do |y|
        y << CSV::Row.new(header, header).to_s(csv_options)
        Model.scope1(param).find_each do |item|
          y << CSV::Row.new(header, [item.method1, item.method2]).to_s(csv_options)
        end
      end
    
      # setting the body to an enumerator, rails will iterate this enumerator
      self.response_body = csv_enumerator
    end