Search code examples
mysqlpdoquery-optimization

Which PDO SQL Query is faster in the long run and heavy data?


From a Table has over a million record, When i pull the data from it,

I want to check if the requested data exists or not, So which path is more efficient and faster then the other?

$Query = '
    SELECT n.id 
    FROM names n 
    INNER JOIN ages a ON n.id = a.aid 
    INNER JOIN regions r ON n.id = r.rid 
    WHERE id = :id
';

$stmt->prepare($Query);
$stmt->execute(['id' => $id]);
if ($stmt->rowCount() == 1) {
    $row = $stmt->fetch();
    ......................
} else {
    exit();
}

or

$EXISTS = 'SELECT EXISTS (
    SELECT n.fname, n.lname, a.age, r.region 
    FROM names n 
    INNER JOIN ages a ON n.id = a.aid 
    INNER JOIN regions r ON n.id = r.rid 
    WHERE id = :id
    LIMIT 1
)
';
$stmt->prepare($EXISTS);
$stmt->execute(['id' => $id]);
if ($stmt->fetchColumn() == 1) {
    $stmt->prepare($Query);
    $stmt->execute(['id' => $id]);
    $row = $stmt->fetch();
    ......................
} else {
    exit();
}

keeping in mind that id is PRIMARY (INT) and aid, rid are INDEXED (INT)


Solution

  • The two methods you show are almost certainly equivalent, with almost no measurable difference in performance.

    SELECT n.id 
    FROM names n 
    INNER JOIN ages a ON n.id = a.aid 
    INNER JOIN regions r ON n.id = r.rid 
    WHERE id = :id
    

    I assume names.id is the primary key of that table. A primary key lookup is very fast.

    Then it will do a secondary key lookup to the other two tables, and it will be an index-only access because there's no reference to other columns of those tables.

    You should learn how to use EXPLAIN to analyze MySQL's optimization plan. This is a skill you should practice any time you want to improve the performance of an SQL query.

    See https://dev.mysql.com/doc/refman/5.7/en/using-explain.html

    mysql> explain SELECT n.id 
        ->     FROM names n 
        ->     INNER JOIN ages a ON n.id = a.aid 
        ->     INNER JOIN regions r ON n.id = r.rid 
        ->     WHERE id = 1;
    
    +----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+
    | id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref   | rows | filtered | Extra       |
    +----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+
    |  1 | SIMPLE      | n     | NULL       | const | PRIMARY       | PRIMARY | 4       | const |    1 |   100.00 | Using index |
    |  1 | SIMPLE      | a     | NULL       | ref   | aid           | aid     | 5       | const |    1 |   100.00 | Using index |
    |  1 | SIMPLE      | r     | NULL       | ref   | rid           | rid     | 5       | const |    1 |   100.00 | Using index |
    +----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+
    

    We see that each table access is using an index (I'm assuming indexes though you did not provide your SHOW CREATE TABLE in your question).

    Compare to the second solution with SELECT EXISTS(...)

    mysql> explain SELECT EXISTS (
        ->     SELECT n.id 
        ->     FROM names n 
        ->     INNER JOIN ages a ON n.id = a.aid 
        ->     INNER JOIN regions r ON n.id = r.rid 
        ->     WHERE id = 1 
        ->     LIMIT 1);
    
    +----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+----------------+
    | id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref   | rows | filtered | Extra          |
    +----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+----------------+
    |  1 | PRIMARY     | NULL  | NULL       | NULL  | NULL          | NULL    | NULL    | NULL  | NULL |     NULL | No tables used |
    |  2 | SUBQUERY    | n     | NULL       | const | PRIMARY       | PRIMARY | 4       | const |    1 |   100.00 | Using index    |
    |  2 | SUBQUERY    | a     | NULL       | ref   | aid           | aid     | 5       | const |    1 |   100.00 | Using index    |
    |  2 | SUBQUERY    | r     | NULL       | ref   | rid           | rid     | 5       | const |    1 |   100.00 | Using index    |
    +----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+----------------+
    

    The subquery looks identical to the first query optimization plan; it still uses indexes in the same way. But it's relegated to a subquery. Probably not a big difference, but it's one more thing.

    The only advantage is that the SELECT EXISTS... query is guaranteed to return just one row with a true/false value. The first query might return a result set with zero, one, or many rows, depending how many matched the JOINs in the query. The difference is not a performance difference (unless it returns so many rows that it takes time to transfer the result set to the client, or uses a lot of memory to hold the result set in the client), but just a matter of convenience for the way you code it.