Search code examples
mysqlruby-on-railsrubyherokudelayed-job

How to best import and process very large csv files with rails


I am building a rails app that I am deploying with Heroku, and I need to be able to import and process large csv files (5000+ lines).

Doing it in the controller using the built in ruby csv parser takes over 30 seconds and causes the Heroku dyno to time out

I was thinking of putting the csv into the database then processing it with a delayed_job but this method limits out at just over 4200 lines.

I am using mysql and longtext for the column containing the file so the db should be able to handle it

Any ideas for this use case?


Solution

    • to import csv faster, my suggestion is using gem smarter_csv, you can cek from their website tilo/smarter_csv
    • as stated from their site: > smarter_csv is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with Mongoid or ActiveRecord, and parallel processing with Resque or Sidekiq
    • I use this gem and combined with resque

    below is sample code to import file

      n = SmarterCSV.process(params[:file].path) do |chunk|
        Resque.enqueue(ImportDataMethod, chunk)
      end
    

    after it read file, passed the data record to resque and then import it in background (if you using rails 4.2 above you can combine with rails active job)