Search code examples
sqlpostgresqlaggregate-functionspostgresql-performance

Use the results of SUM in the same query


SELECT 
business_period,
SUM(transaction.transaction_value) AS total_transaction_value,
SUM(transaction.loss_value) AS total_loss_value,
(total_transaction_value - total_loss_value) AS net_value
FROM transaction
GROUP BY business_period

The above does not work as total_transaction_value and total_loss_value are not from the transaction table. Is there a way to make this query work?

Note: this query involves 500 million rows, so need to efficient.

Question:
Some answers have suggested that SUM(transaction.transaction_value) - SUM(transaction.loss_value) is cached and won't need to be computed again where as others are suggesting that I should as a derived table / subsequery to avoid repeated computation. Could someone point to something that could settle the difference in opinion?

I am using postgres 9.3.

ANSWER:

I want to quote erwin's comment here:

I ran a quick test with 40k rows and the winner was the plain version without subquery. CTE was slowest. So I think my first assumption was wrong and the query planner understands not to calculate the sums repeatedly (makes sense, too). I have seen different results with more complex expressions in the past. The planner does get smarter with every new version


Solution

  • Use:

    SELECT 
    business_period,
    SUM(transaction.transaction_value) AS total_transaction_value,
    SUM(transaction.loss_value) AS total_loss_value,
    (SUM(transaction.transaction_value) - SUM(transaction.loss_value)) AS net_value
    FROM transaction
    GROUP BY business_period