I would like to import the data from txt file to database in Ruby. I have tried to create a rake task to do so and struggling with finding a elegant way of doing it.
desc "Import schools."
task :import_schools => :environment do
File.open(File.join(Rails.root, "imports", "schools.txt"), "r").each do |line|
if ! line.valid_encoding?
s = line.encode("UTF-16be", :invalid=>:replace, :replace=>"?").encode('UTF-8')
s.gsub(/dr/i,'med')
description, time, standards, books, choices = s.strip.split("\t")
u = ImportResult.new(:description => description, :time => time)
u.save
end
end
end
primary 23484775884 standard:fifth book:science choice:maths name:Joseph city:London
secondary 46537728836 standard:fourth book:english choice:maths name:Jain city:Manchester
.........
I want to insert each of these record into ImportResult
database and ignore name
and city
of each record.
ImportResult Table:
id: 1
description: primary
time: 23484775884
standard: fifth
bookname: science
Thanks
The main challenge here is this: How do I turn a line from the text file into a hash of attributes I can pass to MyModel.create
? Since each line in the text file has the fields in the same order, one simple approach is to just use a regular expression:
LINE_TO_ATTRS_EXPR = /
\A
(?<description>\w+)\s+
(?<time>\d+)\s+
standard:(?<standard>\w+)\s+
book:(?<book>\w+)\s+
choice:(?<choice>\w+)\s
/x
def line_to_attrs(line)
matches = LINE_TO_ATTRS_EXPR.match(line)
Hash[ matches.names.zip(matches.captures) ]
end
p line_to_attrs("primary 23484775884 standard:fifth book:science choice:maths name:Joseph city:London")
# => { "description" => "primary",
# "time" => "23484775884",
# "standard" => "fifth",
# "book" => "science",
# "choice" => "maths" }
Here I've assumed that the time
field will always be a string of digits (\d+
) and that fields are separated by whitespace (\s+
).
Another approach is to split the line on whitespace and then split each of those parts on the colon (:
) and use the part on the left as the key and the part on the right as the value. Since the first two fields are in a different format, we pull them off the array first. Then we can use each_with_object
to put them into a Hash, skipping the keys we don't want:
def line_to_attrs(line)
attrs = {}
attrs[:description], attrs[:time], *rest = line.split(/\s+/)
rest.each_with_object(attrs) do |part, attrs|
key, val = part.split(':')
next if key =~ /^(name|city)$/
attrs[key.to_sym] = val
end
end
Whichever method you choose, you can now apply it to each line to get a hash of attributes to pass to ImportResult.create!
:
File.open(Rails.root + "imports/schools.txt", "r") do |file|
ImportResult.transaction do
file.each_line do |line|
ImportResult.create!(line_to_attrs(line))
end
end
end
Note that I used File.open(...) do ...
instead of File.open(...).each
do. Using open
with a block ensures that the file will be closed when the operation is complete, even if errors occur.
If your input is large, however, you may find that this is slow. That's because you're creating an ActiveRecord object for each line and performing one insert at a time. Doing it in a transaction helps, but only so much. If performance becomes an issue, I suggest looking at the activerecord-import gem.