Search code examples
mysqlsqlperformancesubquerysql-limit

Performance considerations for SUB QUERY, LIMIT and ORDER BY when using MySQL


Objective: Select 10 users whose ongoing games is < 5.

Approach 1: Do everything within MySQL

Select users (LIMIT 10) and check n_ongoingGames for each user

Approach 2: SQL + result-set analysis

Get records for say 30 users. For each record, go to db again and check for n_ongoingGames. Assuming probability of (n_ongoingGames > 5) for each user as 50%, we'll need to check for 20 records.

-- Approach 1: Do everything within MySQL
-- --------------------------------------
SELECT u.*
FROM USERS u
WHERE 
    ( -- n_ongoingGames for user < 5
        (
            SELECT COUNT(gameID) AS n_ongoingGames
            FROM games g
            WHERE 
                (g.player1_userID = u.userID) OR 
                (g.player2_userID = u.userID) OR
                (g.player3_userID = u.userID) OR
                ...
                ...
                ...
                (g.player10_userID = u.userID)
        ) < 5
    ) AND
    (u.cash_balance > 500)
ORDER BY u.user_rating
LIMIT 10;


-- Approach 2: SQL + result-set analysis
-- -------------------------------------
SELECT u.*
FROM USERS u
WHERE
    (u.cash_balance > 500)
ORDER BY u.user_rating
LIMIT 30;

Questions:

  1. In approach 1, does MySQL check (n_ongoingGames for user < 5) for every user in the users table ?
  2. Which approach is faster ?
  3. Does it make a difference (on which approach is faster) if only 1 user is needed instead of 10 (LIMIT 1)

Thanks.


Solution

  • I would suggest the following, assuming a user cannot be both players:

    SELECT u.*
    FROM USERS u
    WHERE u.cash_balance > 500 AND
          ((SELECT COUNT(*) AS n_ongoingGames
            FROM games g
            WHERE g.player1_userID = u.userID
           ) +
           (SELECT COUNT(*) AS n_ongoingGames
            FROM games g
            WHERE g.player2_userID = u.userID
           )
          ) < 5
    ORDER BY u.user_rating
    LIMIT 10;
    

    With the following indexes: games(player1_userID) and games(player2_userID). You also want an index on users; one possibility is users(user_rating, cash_balance), but I don't think MySQL will be smart enough to scan the table in order using the index. You might have to settle for users(cash_balance).

    The indexes on games() mean that the counts can be satisfied in the indexes. This should be a particularly fast type of counting. If you remove the condition on cash balance, then an index on users(user_rating) should make the query quite fast.