Search code examples
mysqlquery-optimizationdatabase-performance

What's the efficient way to retrieve "likes" of a post in sql?


So I have two tables, 'videos' and 'video_likes'. The 'videos' table looks like this:

id | creator | url        | likes
1  |  5      |https://... |  10
2  |  8      |https://... |  20
3  |  4      |https://... |  30
4  |  2      |https://... |  40

And 'video_likes' table looks like this:

id | video_id | like_user_id
1  |  2       |   8
2  |  2       |   5
3  |  4       |   2

As you can see both are basic tables. Ok Now here is the fun part. When a user likes/dislikes a post, instead of incrementing and decrementing likes' from 'videos' table, I just update them to be as same as the count of rows in the 'video_likes' table, like this:

UPDATE videos SET likes = (SELECT COUNT(id) FROM video_likes WHERE video_id = '$video_id') WHERE id = '$video_id'

When it is time to fetch the no. of likes of the video, I just do this:

SELECT likes FROM videos WHERE id = '$video_id' LIMIT 1

The problem is that the update query is called each time a user likes or dislikes a video, which could be very expensive I guess. I have three questions related to this problem:

1) is there a performance issue when the 'video_likes' table grows to a large number, say a million or a billion?

2) Does sql lock the table while updating the count? If yes, will a new insert (a user liking the post while the update is going on) fail? If so, how to prevent this?

3) What is the most fastest and efficient way to achieve the same result? I don't want to increment or decrement the likes field, because this will be just a denormalization and there could be inconsistencies in the actual likes count. What would be the right way to do this?
These are a few questions that has been troubling me for the past few days. Hope you'll answer. Regards.


Solution

  • I recommend

    • Don't recompute; that will just get slower and slower.
    • Increment the counter at the same time you INSERT into the Likes table.
    • When you get so busy that there are hundreds of writes (Inserts, Updates) per second, split out the like counter into a parallel table ("vertical partitioning"). This will help avoid conflicts due to the row lock (not table lock) when doing increment a single video like-count, versus other accesses to that table.
    • SELECT likes FROM videos WHERE id = '$video_id' LIMIT 1 does not make sense -- is there more than one row for a video in the videos table? Why? If so, which one (limit 1) do you want?
    • Did you also want a "dislike" counter? Or will you simply 'decrement'. Think about it.
    • video_likes does not need an id. Instead, PRIMARY KEY(video_id, user_id). This will let you prevent a user from repeatedly bumping the count. (However, this needs checking before doing the increment.)
    • etc.

    Dislike

    If "Dislike" is not a feature, but "Un-Like" is, I suggest "Un-Like" be implemented something like

    START TRANSACTION;
    DELETE FROM video_likes WHERE ...;
    UPDATE videos SET like_ct = like_ct - 1 WHERE ...;
    COMMIT;
    

    That way, the two "like" mechanism would stay perfectly in sync. This is, to some extent, a mirror image of the "Like" code.

    LIMIT 1

    Think of LIMIT only for controlling how much output you get.

    • When there is only one row anyway, LIMIT 1 doesn't change how long the query takes.
    • When the query has to sort (ORDER BY) to decide which is the first/last/whatever, then it has already done a lot of work; the LIMIT merely cuts back on the number of rows delivered.
    • On the other hand, when an INDEX can be used for the ORDER BY, the sort might vanish. Hence the LIMIT controls the number of rows delivered.

    AUTO_INCREMENT or not

    • A table must have a PRIMARY KEY.
    • If you don't explicitly specify a PK, one is generated for you. (Not advised.)
    • A "natural" PK is on that occurs naturally. Example, a table of countries could (should) use the standard 2-letter "country codes" -- US, FR, IN, RU, CN, CH, IT, etc.
    • id INT UNSIGNED NOT NULL AUTO_INCREMENT is useful for tables that don't have a "natural" PK. Or the natural PK is a long string.

    In that example with a 4 byte id and a 2-byte country-code, you need two indexes:

    PRIMARY KEY(id), INDEX(country_code)
    

    versus one:

    PRIMARY KEY(country_code)
    

    And the table with id is bigger. (Though not enough bigger in this example to really matter.)

    In the case of a "many:many", such as your video_likes, the id hurts performance. I discuss that here.