ruby-on-rails-3activerecordduplicatesdestroy

Remove duplicate records based on multiple columns?


I'm using Heroku to host my Ruby on Rails application and for one reason or another, I may have some duplicate rows.

Is there a way to delete duplicate records based on 2 or more criteria but keep just 1 record of that duplicate collection?

In my use case, I have a Make and Model relationship for cars in my database.

Make      Model
---       ---
Name      Name
          Year
          Trim
          MakeId

I'd like to delete all Model records that have the same Name, Year and Trim but keep 1 of those records (meaning, I need the record but only once). I'm using Heroku console so I can run some active record queries easily.

Any suggestions?


Solution

  • class Model
    
      def self.dedupe
        # find all models and group them on keys which should be common
        grouped = all.group_by{|model| [model.name,model.year,model.trim,model.make_id] }
        grouped.values.each do |duplicates|
          # the first one we want to keep right?
          first_one = duplicates.shift # or pop for last one
          # if there are any more left, they are duplicates
          # so delete all of them
          duplicates.each{|double| double.destroy} # duplicates can now be destroyed
        end
      end
    
    end
    
    Model.dedupe
    
    • Find All
    • Group them on keys which you need for uniqueness
    • Loop on the grouped model's values of the hash
    • remove the first value because you want to retain one copy
    • delete the rest