Search code examples
ruby-on-railsjoinactiverecordsumhas-many

Sum attribute from model that has associated models that match certain condition


In a Ruby on Rails project, 4.0, I have a Payments model that has_many Transactions model.

If I have the following data

SELECT * FROM payments;
id | paid_amount
---+--------------
1  | 200
2  | 300
3  | 100
4  | 400
5  | 100
6  | 600
7  | 100
8  | 100
9  | 800

SELECT * FROM transactions;
id | payment_id  | type
---+-------------+------
1  | 2           | cash
2  | 3           | credit
3  | 1           | credit
4  | 4           | cash
5  | 1           | cash
6  | 6           | credit
7  | 1           | cash
8  | 1           | credit
9  | 8           | cash

Right now, I'm calculating the sum of an numeric attribute paid_amount from Payments, that has a transaction of certain type, as follows

> Payment.where(id: Transaction.where(type: 'cash').pluck(:payment_id)).sum(:paid_amount)
> 1000
> # The sum of paid_amount from payments with ids 1, 2, 4 and 8

But this doesn't work fast enough with many thousands of records so I tried to accomplish this with includes without any luck.

> Payment.includes(:transactions).where("transactions.type = 'cash'").sum(:paid_amount)
> 1200
> # paid_amount of payment with id 1 is considered two times because of transactions with id 5 and 7

Any thoughts on how should I calculate the number I need?


Solution

  • Your first query:

    Payment.where(id: Transaction.where(type: 'cash').pluck(:payment_id)).sum(:paid_amount)
    

    is slow partly because Transaction.where(type: 'cash').pluck(:payment_id) is a query that returns an array and then you send that array back to the database with an IN (big_list_of_integers) in the SQL. A slight modification will use a subquery instead, just switch the pluck to select:

    Payment.where(id: Transaction.where(type: 'cash').select(:payment_id)).sum(:paid_amount)
    

    that will result in one query like:

    select sum(payments.paid_amount)
    from payments
    where payments.id in (
      select payment_id
      from transactions
      where transactions.type = ...
    )
    

    being sent to the database. This should be quite a bit faster with large lists.