ruby-on-rails join activerecord sum has-many

Sum attribute from model that has associated models that match certain condition

In a Ruby on Rails project, 4.0, I have a Payments model that has_many Transactions model.

If I have the following data

SELECT * FROM payments;
id | paid_amount
---+--------------
1  | 200
2  | 300
3  | 100
4  | 400
5  | 100
6  | 600
7  | 100
8  | 100
9  | 800

SELECT * FROM transactions;
id | payment_id  | type
---+-------------+------
1  | 2           | cash
2  | 3           | credit
3  | 1           | credit
4  | 4           | cash
5  | 1           | cash
6  | 6           | credit
7  | 1           | cash
8  | 1           | credit
9  | 8           | cash

Right now, I'm calculating the sum of an numeric attribute paid_amount from Payments, that has a transaction of certain type, as follows

> Payment.where(id: Transaction.where(type: 'cash').pluck(:payment_id)).sum(:paid_amount)
> 1000
> # The sum of paid_amount from payments with ids 1, 2, 4 and 8

But this doesn't work fast enough with many thousands of records so I tried to accomplish this with includes without any luck.

> Payment.includes(:transactions).where("transactions.type = 'cash'").sum(:paid_amount)
> 1200
> # paid_amount of payment with id 1 is considered two times because of transactions with id 5 and 7

Any thoughts on how should I calculate the number I need?

Solution

Your first query:

Payment.where(id: Transaction.where(type: 'cash').pluck(:payment_id)).sum(:paid_amount)

is slow partly because Transaction.where(type: 'cash').pluck(:payment_id) is a query that returns an array and then you send that array back to the database with an IN (big_list_of_integers) in the SQL. A slight modification will use a subquery instead, just switch the pluck to select:

Payment.where(id: Transaction.where(type: 'cash').select(:payment_id)).sum(:paid_amount)

that will result in one query like:

select sum(payments.paid_amount)
from payments
where payments.id in (
  select payment_id
  from transactions
  where transactions.type = ...
)

being sent to the database. This should be quite a bit faster with large lists.