Search code examples
sqldatabasepostgresqlaggregate-functions

What is the difference between direct addition (1 + 1) vs aggregate function SUM(1 + 1)


See the query below where I get the reputation value from direct addition without a function.

SELECT
    "User"."id",
    "User"."name",
    "User"."title",
    "User"."about",
    "User"."location",
    "User"."isModerator",
    "User"."createdAt",
    (
--      question vote
        COALESCE(SUM("PostVote"."value" * 10) FILTER (WHERE "value" = 1 AND "Post"."questionId" IS NULL), 0)::int 
        +
        COALESCE(SUM("PostVote"."value" * 2) FILTER (WHERE "value" = -1 AND "Post"."questionId" IS NULL), 0)::int 
        +
--      answer vote
        COALESCE(SUM("PostVote"."value" * 10) FILTER (WHERE "value" = 1 AND "Post"."questionId" IS NOT NULL), 0)::int 
        +
        COALESCE(SUM("PostVote"."value" * 2) FILTER (WHERE "value" = -1 AND "Post"."questionId" IS NOT NULL), 0)::int 
        +
--      accepted answer
        (COALESCE(COUNT("answer"."id"), 0)::int * 15)
    ) as reputation
FROM "User"
LEFT JOIN "Post" ON "Post"."authorId" = "User"."id"
LEFT JOIN "PostVote" ON "PostVote"."postId" = "Post"."id"
LEFT JOIN "Post" as answer ON "Post"."acceptedAnswerId" = "answer"."id"
GROUP BY "User"."id"
ORDER BY reputation DESC, id

I did this because SUM cannot be nested, I originally wanted to do something like this.

-- ...
    SUM(
--      question vote
        COALESCE(SUM("PostVote"."value" * 10) FILTER (WHERE "value" = 1 AND "Post"."questionId" IS NULL), 0)::int 
        +
        COALESCE(SUM("PostVote"."value" * 2) FILTER (WHERE "value" = -1 AND "Post"."questionId" IS NULL), 0)::int 
        +
--      answer vote
        COALESCE(SUM("PostVote"."value" * 10) FILTER (WHERE "value" = 1 AND "Post"."questionId" IS NOT NULL), 0)::int 
        +
        COALESCE(SUM("PostVote"."value" * 2) FILTER (WHERE "value" = -1 AND "Post"."questionId" IS NOT NULL), 0)::int 
        +
--      accepted answer
        (COALESCE(COUNT("answer"."id"), 0)::int * 15)
    ) as reputation
-- ...

I'm curious if there are any performance impacts or summations that might potentially result in incorrect values? I have tested with negative values and it works fine, maybe there is something I missed.

I could have used the SUM function by nesting the queries, but I thought this would be too verbose.


Solution

  • I could have used the SUM function by nesting the queries ...

    No you couldn't have. You are missing the fundamental difference between an aggregate function (sum()) and a plain operator (+). ruakh's answer already goes into detail.

    But since you expressed concern for performance impacts: you can optimize your use of count() which never returns null. See:

    Assuming referential integrity, there is no need to actually join to the accepted answer. Just count not-null values in p."acceptedAnswerId". (I am not convinced you are counting "accepts" correctly in the first place, but I stuck to your logic.)

    Actually looks like you can replace all instances of sum() with a faster count(). Consequently, you need no COALESCE at all:

    SELECT u.id, u.name, u.title, u.about, u.location, u."isModerator", u."createdAt"
         ,( -- q & a upvote
           count(*) FILTER (WHERE pv."value" =  1)::int * 10
           -- q & a downvote 
         + count(*) FILTER (WHERE pv."value" = -1)::int * -2
           -- accepted answer
         + count(p."acceptedAnswerId")::int * 15  -- cheaper
          ) AS reputation
    FROM   "User"          u
    LEFT   JOIN "Post"     p  ON p."authorId" = u.id
    LEFT   JOIN "PostVote" pv ON pv."postId" = p.id
    -- LEFT JOIN "Post"     a  ON a.id = p."acceptedAnswerId"  -- noise?! 
    GROUP  BY u.id
    ORDER  BY reputation DESC, u.id;