Search code examples
sqlruby-on-railsscale

Have more than 400 000, repopulating the DB takes 5 hours


Simply running

ElectricityProfile.find_each do |ep|
  if UserProfile.exists?(ep.owner_id) && ep.owner_type == 'UserProfile'
    ElectricityProfileSummary.create(ep)
  end
end

Takes ages (5 hours) to populate the table. Is there any better way to populate the DB?

Lets say get all the data from the DB and store it in array, hash, etc and then push to create a DB

ElectricityProfile.find_each do |ep|
  if UserProfile.exists?(ep.owner_id) && ep.owner_type == 'UserProfile'
    array_of_electricity_profiles.push(ep)
  end
end
ElectricityProfileSummary.mass_create(ep) # => or any other method :)

Sorry forgot mention I do have overridden method create, that takes multiple models and creates ElectricityProfileSummary...

create!(:owner_id => electricity_profile.owner_id,
        :owner_type => electricity_profile.owner_type,
        :property_type => electricity_profile.owner.user_property_type,
        :household_size => electricity_profile.owner.user_num_of_people,
        :has_swimming_pool => electricity_profile.has_swimming_pool,
        :bill => electricity_bill,
        :contract => electricity_profile.on_contract,
        :dirty => true,
        :provider => electricity_profile.supplier_id,
        :plan => electricity_profile.plan_id,
        :state => state,
        :postcode => postcode,
        :discount => discount,
        :has_air_conditioner => electricity_profile.has_air_conditioner,
        :has_electric_hot_water => electricity_profile.has_electric_hot_water,
        :has_electric_central_heating => electricity_profile.has_electric_central_heating,
        :has_electric_cooktup => electricity_profile.has_electric_cooktup
)

Solution

  • Doing this in a stored procedure or raw SQL would probably be the best way to go since ActiveRecord can be very expensive when dealing with that many records. However, you can speed it up quite a bit by using includes or joins.

    It looks like you only want to create ElectricityProfileSummary models. I am a little unsure of how your relationships look, but assuming you have the following:

    class ElectricityProfile
      belongs_to :owner, polymorphic: true
    end
    
    class UserProfile
      has_many :electricity_profiles, as: owner
    end
    

    ... you should be able to do something like this:

    ElectricityProfile.includes(:owner).each do |ep|
      ElectricityProfileSummary.create(ep)
    end
    

    Now, I am basing this on the assumption that you are using a polymorphic relationship between ElectricityProfile and UserProfile. If that is not the case, let me know. (I made the assumption because you have owner_id and owner_type, which as a pair make up the two fields necessary for polymorphic relationships.)

    Why is using an includes better? Using includes causes ActiveRecord to eager load the relationship between the two models, so you're not doing n+1 queries like you are now. Actually, because you are creating records based on the number of ElectricityProfile records, you're still doing n+1, but what you are doing now is more expensive than n+1 because you are querying UserProfile for every single ElectricityProfile, and then you are querying UserProfile again when creating the ElectricityProfileSummary because you are lazy loading the relationship between EP and UP.

    When you do includes, Rails will use an inner join to query between the two tables. Using an inner join eliminates the necessity to do ensure that the UserProfile exists, since the inner join will only return records where both sides of the relationship exist.