Search code examples
sqlpostgresqlsumdistinct

Find the sum of values from distinct groups


I have a table like below.

id account_id sha256 size
1 1 abc 120
2 1 abc 120
3 1 bcd 150
4 2 abc 120
5 2 def 80
6 3 fed 100
7 3 fed 100

I need to find the sum of the size column but same sha256 for an account should be added only once. Rows to be added should as below.

id account_id sha256 size
1 1 abc 120
3 1 bcd 150
4 2 abc 120
5 2 def 80
6 3 fed 100

Row number 2 and 7 is removed due to duplicate sha256 value per same account. Row 4 is not removed as it belongs to a different account even though it has the same sha256, and sum should be 570.

Tried below query, but giving a syntax error at or near "distinct".

SELECT SUM(f.size) FROM 
(SELECT account_id, DISTINCT sha256, size FROM files GROUP BY account_id, sha256, size) f

Solution

  • Use DISTINCT ON:

    SELECT SUM(size) AS total
    FROM
    (
        SELECT DISTINCT ON (account_id, sha256, size) size
        FROM FILES
        ORDER BY account_id, sha256, size, id
    ) t;
    

    The above logic retains, for each group of (account_id, sha256, size) values, a single record corresponding to the lowest id value. This set of records is then summed by size to get the total.