I am busy writing an migration that will allow us to move our yamler from Syck to Psych and finally upgrade our project to ruby 2. This migration is going to be seriously resource intensive though so I am going to need to use chunking.
I wrote the following method to confirm that the result of the migration I plan to use produces the expected result and can be done without down time. To avoid Active record performing the serialization automatically I needed to use ActiveRecord::Base.connection.execute
My method that describes the transformation is as follows
def show_summary(table, column_name)
a = ActiveRecord::Base.connection.execute <<-SQL
SELECT id, #{column_name} FROM #{table}
SQL
all_rows = a.to_a; ""
problem_rows = all_rows.select do |row|
original_string = Syck.dump(Syck.load(row[1]))
orginal_object = Syck.load(original_string)
new_string = Psych.dump(orginal_object)
new_object = Syck.load(new_string)
Syck.dump(new_object) != original_string rescue true
end
problem_rows.map do |row|
old_string = Syck.dump(Syck.load(row[1]))
new_string = Psych.dump(Syck.load(old_string)) rescue "Parse failure"
roundtrip_string = begin
Syck.dump(Syck.load(new_string))
rescue => e
e.message
end
new_row = {}
new_row[:id] = row[0]
new_row[:original_encoding] = old_string
new_row[:new_encoding] = roundtrip_string
new_row
end
end
How can you use batching when making use of ActiveRecord::Base.connection.execute
?
For completeness my update function is as follows
# Migrate the given serialized YAML column from Syck to Psych
# (if any).
def migrate_to_psych(table, column)
table_name = ActiveRecord::Base.connection.quote_table_name(table)
column_name = ActiveRecord::Base.connection.quote_column_name(column)
fetch_data(table_name, column_name).each do |row|
transformed = ::Psych.dump(convert(Syck.load(row[column])))
ActiveRecord::Base.connection.execute <<-SQL
UPDATE #{table_name}
SET #{column_name} = #{ActiveRecord::Base.connection.quote(transformed)}
WHERE id = #{row['id']};
SQL
end
end
def fetch_data(table_name, column_name)
ActiveRecord::Base.connection.select_all <<-SQL
SELECT id, #{column_name}
FROM #{table_name}
WHERE #{column_name} LIKE '---%'
SQL
end
Which I got from http://fossies.org/linux/openproject/db/migrate/migration_utils/legacy_yamler.rb
You can easily build something with SQL's LIMIT
and OFFSET
clauses:
def fetch_data(table_name, column_name)
batch_size, offset = 1000, 0
begin
batch = ActiveRecord::Base.connection.select_all <<-SQL
SELECT id, #{column_name}
FROM #{table_name}
WHERE #{column_name} LIKE '---%'
LIMIT #{batch_size}
OFFSET #{offset}
SQL
batch.each do |row|
yield row
end
offset += batch_size
end until batch.empty?
end
which you can use almost exactly the same as before, just without the .each
:
fetch_data(table_name, column_name) do |row| ... end
HTH!