I have a table like below.
id | account_id | sha256 | size |
---|---|---|---|
1 | 1 | abc | 120 |
2 | 1 | abc | 120 |
3 | 1 | bcd | 150 |
4 | 2 | abc | 120 |
5 | 2 | def | 80 |
6 | 3 | fed | 100 |
7 | 3 | fed | 100 |
I need to find the sum of the size column but same sha256 for an account should be added only once. Rows to be added should as below.
id | account_id | sha256 | size |
---|---|---|---|
1 | 1 | abc | 120 |
3 | 1 | bcd | 150 |
4 | 2 | abc | 120 |
5 | 2 | def | 80 |
6 | 3 | fed | 100 |
Row number 2 and 7 is removed due to duplicate sha256 value per same account. Row 4 is not removed as it belongs to a different account even though it has the same sha256, and sum should be 570.
Tried below query, but giving a syntax error at or near "distinct".
SELECT SUM(f.size) FROM
(SELECT account_id, DISTINCT sha256, size FROM files GROUP BY account_id, sha256, size) f
Use DISTINCT ON
:
SELECT SUM(size) AS total
FROM
(
SELECT DISTINCT ON (account_id, sha256, size) size
FROM FILES
ORDER BY account_id, sha256, size, id
) t;
The above logic retains, for each group of (account_id, sha256, size)
values, a single record corresponding to the lowest id
value. This set of records is then summed by size to get the total.