Search code examples
postgresqltriggerstransactionsplpgsqlatomic

Getting incorrect total number of likes and dislikes sometimes


I have a table feed_item_likes_dislikes in PostgresQL (feed_item_id, user_id, vote) where

  • feed_item_id is uuid
  • user_id is integer
  • vote = TRUE then it is a like
  • vote = FALSE then it is a dislike
  • vote = NULL meaning the user originally liked or disliked but came back and removed the vote by unvoting

I have another table feed_item_likes_dislikes_aggregate(feed_item_id, likes, dislikes) where I want to maintain total number of likes dislikes per post

when the user adds a new like in the feed_item_likes_dislikes table with

INSERT INTO feed_item_likes_dislikes VALUES('54d67b62-9b71-a6bc-d934-451c1eaae3bc', 1, TRUE);

I want to update the total number of likes in the aggregate table. Similar case needs to be handled for dislikes and when user unvotes something by setting vote to null

User may also update their like to a dislike and vice versa and in every condition, the total number of likes and dislikes for that post needs to be maintained

I wrote the following trigger function to accomplish this

CREATE OR REPLACE FUNCTION update_votes() RETURNS trigger AS $$
    DECLARE
        feed_item_id_val uuid;
        likes_val integer;
        dislikes_val integer;
    BEGIN
        IF (TG_OP = 'DELETE') THEN
            -- when a row is deleted, store feed_item_id of the deleted row so that we can update its likes and dislikes count
            feed_item_id_val:=OLD.feed_item_id;
        ELSIF (TG_OP = 'UPDATE') OR (TG_OP='INSERT') THEN
            feed_item_id_val:=NEW.feed_item_id;
        END IF;
        -- get total number of likes and dislikes for the given feed_item_id
        SELECT COUNT(*) FILTER(WHERE vote=TRUE) AS likes, COUNT(*) FILTER(WHERE vote=FALSE) AS dislikes INTO likes_val, dislikes_val FROM feed_item_likes_dislikes WHERE feed_item_id=feed_item_id_val;
        -- update the aggregate count for only this feed_item_id
        INSERT INTO feed_item_likes_dislikes_aggregate (feed_item_id, likes, dislikes) VALUES (feed_item_id_val, likes_val, dislikes_val) ON CONFLICT(feed_item_id) DO UPDATE SET likes=likes_val, dislikes=dislikes_val;
        RETURN NULL; -- result is ignored since this is an AFTER trigger
    END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER update_votes_trigger AFTER INSERT OR UPDATE OR DELETE ON feed_item_likes_dislikes FOR EACH ROW EXECUTE PROCEDURE update_votes();

But when I do a bulk insert into feed_item_likes_dislikes, sometimes the total number of likes and dislikes is incorrect.

Can someone kindly tell me how I can fix this?

Update 1

I tried creating a view but it takes a lot of time on my production data set, here is the db fiddle https://www.db-fiddle.com/f/2ZAkjQhUydMaV9o5xvLgMT/17

Query #1

EXPLAIN ANALYZE SELECT f.feed_item_id,pubdate,link,guid,title,summary,author,feed_id,COALESCE(likes, 0) AS likes,COALESCE(dislikes, 0) AS dislikes,COALESCE(bullish, 0) AS bullish,COALESCE(bearish, 0) AS bearish FROM feed_items f LEFT JOIN likes_dislikes_aggregate l ON f.feed_item_id = l.feed_item_id LEFT JOIN bullish_bearish_aggregate b ON f.feed_item_id = b.feed_item_id  ORDER BY pubdate DESC, f.feed_item_id DESC LIMIT 10;
QUERY PLAN
Limit (cost=112.18..112.21 rows=10 width=238) (actual time=0.257..0.260 rows=10 loops=1)
-> Sort (cost=112.18..112.93 rows=300 width=238) (actual time=0.257..0.257 rows=10 loops=1)
Sort Key: f.pubdate DESC, f.feed_item_id DESC
Sort Method: top-N heapsort Memory: 27kB
-> Hash Left Join (cost=91.10..105.70 rows=300 width=238) (actual time=0.162..0.222 rows=100 loops=1)
Hash Cond: (f.feed_item_id = b.feed_item_id)
-> Hash Left Join (cost=45.55..59.35 rows=300 width=222) (actual time=0.080..0.114 rows=100 loops=1)
Hash Cond: (f.feed_item_id = l.feed_item_id)
-> Seq Scan on feed_items f (cost=0.00..13.00 rows=300 width=206) (actual time=0.004..0.011 rows=100 loops=1)
-> Hash (cost=43.05..43.05 rows=200 width=32) (actual time=0.069..0.069 rows=59 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Subquery Scan on l (cost=39.05..43.05 rows=200 width=32) (actual time=0.037..0.052 rows=59 loops=1)
-> HashAggregate (cost=39.05..41.05 rows=200 width=32) (actual time=0.036..0.046 rows=59 loops=1)
Group Key: feed_item_likes_dislikes.feed_item_id
-> Seq Scan on feed_item_likes_dislikes (cost=0.00..26.60 rows=1660 width=17) (actual time=0.003..0.008 rows=95 loops=1)
-> Hash (cost=43.05..43.05 rows=200 width=32) (actual time=0.064..0.064 rows=63 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Subquery Scan on b (cost=39.05..43.05 rows=200 width=32) (actual time=0.029..0.044 rows=63 loops=1)
-> HashAggregate (cost=39.05..41.05 rows=200 width=32) (actual time=0.028..0.038 rows=63 loops=1)
Group Key: feed_item_bullish_bearish.feed_item_id
-> Seq Scan on feed_item_bullish_bearish (cost=0.00..26.60 rows=1660 width=17) (actual time=0.002..0.007 rows=93 loops=1)
Planning Time: 0.140 ms
Execution Time: 0.328 ms

View on DB Fiddle


Solution

  • The attempt to keep a running aggregate is always riddled with traps, and is almost always not worth the effort. The solution is to not try to store aggregates but to derive them as needed. You do this by creating a VIEW rather that a table. This then removes all additional processing, especially so in this case as your trigger basically contains the query needed to generate the view. ( see Demo here )

    create or replace VIEW likes_dislikes_aggregate as 
       select id
            , count(*) filter(where vote) as likes
            , count(*) filter(where not vote) as dislikes   
            , count(*) filter(where vote is null) as no_vote
         from likes_dislikes
        group by id;
    

    No trigger, no additional code everything with the view through standard DML. Notice that the entity view is basically nothing but your count query without the trigger overhead and maintenance.

    SELECT COUNT(*) FILTER(WHERE vote=TRUE) AS likes, COUNT(*) FILTER(WHERE vote=FALSE) AS dislikes INTO likes_val, dislikes_val FROM feed_item_likes_dislikes WHERE feed_item_id=feed_item_id_val;