Search code examples
rubyruby-on-rails-3rake

rake task for import data from txt file ruby?


I would like to import the data from txt file to database in Ruby. I have tried to create a rake task to do so and struggling with finding a elegant way of doing it.

My Rake task so far:

desc "Import schools." 
  task :import_schools => :environment do
    File.open(File.join(Rails.root, "imports", "schools.txt"), "r").each do |line|
        if ! line.valid_encoding?
          s = line.encode("UTF-16be", :invalid=>:replace, :replace=>"?").encode('UTF-8')
          s.gsub(/dr/i,'med')
          description, time, standards, books, choices = s.strip.split("\t")
          u = ImportResult.new(:description => description, :time => time)
          u.save
      end
    end
  end

My txt file datas like:

primary 23484775884 standard:fifth book:science choice:maths name:Joseph city:London
secondary 46537728836 standard:fourth book:english choice:maths name:Jain city:Manchester
.........

I want to insert each of these record into ImportResult database and ignore name and city of each record.

Expected Result

ImportResult Table:

id: 1
description: primary
time: 23484775884
standard: fifth
bookname: science

Thanks


Solution

  • The main challenge here is this: How do I turn a line from the text file into a hash of attributes I can pass to MyModel.create? Since each line in the text file has the fields in the same order, one simple approach is to just use a regular expression:

    LINE_TO_ATTRS_EXPR = /
      \A
      (?<description>\w+)\s+
      (?<time>\d+)\s+
      standard:(?<standard>\w+)\s+
      book:(?<book>\w+)\s+
      choice:(?<choice>\w+)\s
    /x
    
    def line_to_attrs(line)
      matches = LINE_TO_ATTRS_EXPR.match(line)
      Hash[ matches.names.zip(matches.captures) ]
    end
    
    p line_to_attrs("primary 23484775884 standard:fifth book:science choice:maths name:Joseph city:London")
    # => { "description" => "primary",
    #      "time" => "23484775884",
    #      "standard" => "fifth",
    #      "book" => "science",
    #      "choice" => "maths" }
    

    Here I've assumed that the time field will always be a string of digits (\d+) and that fields are separated by whitespace (\s+).

    Another approach is to split the line on whitespace and then split each of those parts on the colon (:) and use the part on the left as the key and the part on the right as the value. Since the first two fields are in a different format, we pull them off the array first. Then we can use each_with_object to put them into a Hash, skipping the keys we don't want:

    def line_to_attrs(line)
      attrs = {}
      attrs[:description], attrs[:time], *rest = line.split(/\s+/)
    
      rest.each_with_object(attrs) do |part, attrs|
        key, val = part.split(':')
        next if key =~ /^(name|city)$/
        attrs[key.to_sym] = val
      end
    end
    

    Whichever method you choose, you can now apply it to each line to get a hash of attributes to pass to ImportResult.create!:

    File.open(Rails.root + "imports/schools.txt", "r") do |file|
      ImportResult.transaction do
        file.each_line do |line|
          ImportResult.create!(line_to_attrs(line))
        end
      end
    end
    

    Note that I used File.open(...) do ... instead of File.open(...).each do. Using open with a block ensures that the file will be closed when the operation is complete, even if errors occur.

    If your input is large, however, you may find that this is slow. That's because you're creating an ActiveRecord object for each line and performing one insert at a time. Doing it in a transaction helps, but only so much. If performance becomes an issue, I suggest looking at the activerecord-import gem.