Search code examples
ruby-on-railsactiverecordbatchingsyckpsychparser

Batching when using ActiveRecord::Base.connection.execute


I am busy writing an migration that will allow us to move our yamler from Syck to Psych and finally upgrade our project to ruby 2. This migration is going to be seriously resource intensive though so I am going to need to use chunking.

I wrote the following method to confirm that the result of the migration I plan to use produces the expected result and can be done without down time. To avoid Active record performing the serialization automatically I needed to use ActiveRecord::Base.connection.execute

My method that describes the transformation is as follows

 def show_summary(table, column_name)
  a = ActiveRecord::Base.connection.execute <<-SQL
   SELECT id, #{column_name} FROM #{table}
  SQL
  all_rows = a.to_a; ""
  problem_rows = all_rows.select do |row|
    original_string = Syck.dump(Syck.load(row[1]))
    orginal_object = Syck.load(original_string)

    new_string = Psych.dump(orginal_object)
    new_object = Syck.load(new_string)

    Syck.dump(new_object) != original_string rescue true
  end

problem_rows.map do |row|
  old_string = Syck.dump(Syck.load(row[1]))
  new_string = Psych.dump(Syck.load(old_string)) rescue "Parse failure"
  roundtrip_string = begin
    Syck.dump(Syck.load(new_string))
  rescue => e
    e.message
  end

  new_row = {}
  new_row[:id] = row[0]
  new_row[:original_encoding] = old_string
  new_row[:new_encoding] = roundtrip_string
  new_row
  end
end

How can you use batching when making use of ActiveRecord::Base.connection.execute ?

For completeness my update function is as follows

  # Migrate the given serialized YAML column from Syck to Psych
  # (if any).
  def migrate_to_psych(table, column)
    table_name = ActiveRecord::Base.connection.quote_table_name(table)

    column_name = ActiveRecord::Base.connection.quote_column_name(column)

    fetch_data(table_name, column_name).each do |row|
      transformed = ::Psych.dump(convert(Syck.load(row[column])))

      ActiveRecord::Base.connection.execute <<-SQL
         UPDATE #{table_name}
         SET #{column_name} = #{ActiveRecord::Base.connection.quote(transformed)}
         WHERE id = #{row['id']};
      SQL
    end
  end

  def fetch_data(table_name, column_name)
    ActiveRecord::Base.connection.select_all <<-SQL
       SELECT id, #{column_name}
       FROM #{table_name}
       WHERE #{column_name} LIKE '---%'
    SQL
  end

Which I got from http://fossies.org/linux/openproject/db/migrate/migration_utils/legacy_yamler.rb


Solution

  • You can easily build something with SQL's LIMIT and OFFSET clauses:

    def fetch_data(table_name, column_name)
      batch_size, offset = 1000, 0
      begin
        batch = ActiveRecord::Base.connection.select_all <<-SQL
          SELECT id, #{column_name}
          FROM #{table_name}
          WHERE #{column_name} LIKE '---%'
          LIMIT #{batch_size} 
          OFFSET #{offset}
        SQL
        batch.each do |row|
          yield row
        end
        offset += batch_size
      end until batch.empty?
    end
    

    which you can use almost exactly the same as before, just without the .each:

    fetch_data(table_name, column_name) do |row| ... end
    

    HTH!