So I have two tables, 'videos' and 'video_likes'. The 'videos' table looks like this:
id | creator | url | likes
1 | 5 |https://... | 10
2 | 8 |https://... | 20
3 | 4 |https://... | 30
4 | 2 |https://... | 40
And 'video_likes' table looks like this:
id | video_id | like_user_id
1 | 2 | 8
2 | 2 | 5
3 | 4 | 2
As you can see both are basic tables. Ok Now here is the fun part. When a user likes/dislikes a post, instead of incrementing and decrementing likes' from 'videos' table, I just update them to be as same as the count of rows in the 'video_likes' table, like this:
UPDATE videos SET likes = (SELECT COUNT(id) FROM video_likes WHERE video_id = '$video_id') WHERE id = '$video_id'
When it is time to fetch the no. of likes of the video, I just do this:
SELECT likes FROM videos WHERE id = '$video_id' LIMIT 1
The problem is that the update query is called each time a user likes or dislikes a video, which could be very expensive I guess. I have three questions related to this problem:
1) is there a performance issue when the 'video_likes' table grows to a large number, say a million or a billion?
2) Does sql lock the table while updating the count? If yes, will a new insert (a user liking the post while the update is going on) fail? If so, how to prevent this?
3) What is the most fastest and efficient way to achieve the same result? I don't want to increment or decrement the likes field, because this will be just a denormalization and there could be inconsistencies in the actual likes count. What would be the right way to do this?
These are a few questions that has been troubling me for the past few days. Hope you'll answer. Regards.
I recommend
INSERT
into the Likes table.SELECT likes FROM videos WHERE id = '$video_id' LIMIT 1
does not make sense -- is there more than one row for a video in the videos
table? Why? If so, which one (limit 1
) do you want?video_likes
does not need an id
. Instead, PRIMARY KEY(video_id, user_id)
. This will let you prevent a user from repeatedly bumping the count. (However, this needs checking before doing the increment.)Dislike
If "Dislike" is not a feature, but "Un-Like" is, I suggest "Un-Like" be implemented something like
START TRANSACTION;
DELETE FROM video_likes WHERE ...;
UPDATE videos SET like_ct = like_ct - 1 WHERE ...;
COMMIT;
That way, the two "like" mechanism would stay perfectly in sync. This is, to some extent, a mirror image of the "Like" code.
LIMIT 1
Think of LIMIT
only for controlling how much output you get.
LIMIT 1
doesn't change how long the query takes.ORDER BY
) to decide which is the first/last/whatever, then it has already done a lot of work; the LIMIT
merely cuts back on the number of rows delivered.INDEX
can be used for the ORDER BY
, the sort might vanish. Hence the LIMIT
controls the number of rows delivered.AUTO_INCREMENT or not
PRIMARY KEY
.id INT UNSIGNED NOT NULL AUTO_INCREMENT
is useful for tables that don't have a "natural" PK. Or the natural PK is a long string.In that example with a 4 byte id and a 2-byte country-code, you need two indexes:
PRIMARY KEY(id), INDEX(country_code)
versus one:
PRIMARY KEY(country_code)
And the table with id
is bigger. (Though not enough bigger in this example to really matter.)
In the case of a "many:many", such as your video_likes
, the id
hurts performance. I discuss that here.