Search code examples
mysqljoinoptimizationsubquerydatabase-administration

Optimizing join on derived table - EXPLAIN different on local and server


I have the following ugly query, which runs okay but not great, on my local machine (1.4 secs, running v5.7). On the server I'm using, which is running an older version of MySQL (v5.5), the query just hangs. It seems to get caught on "Copying to tmp table":

SELECT
  SQL_CALC_FOUND_ROWS
  DISTINCT p.parcel_number,
  p.street_number,
  p.street_name,
  p.site_address_city_state,
  p.number_of_units,
  p.number_of_stories,
  p.bedrooms,
  p.bathrooms,
  p.lot_area_sqft,
  p.cost_per_sq_ft,
  p.year_built,
  p.sales_date,
  p.sales_price,
  p.id
  FROM (
    SELECT APN, property_case_detail_id FROM property_inspection AS pi
      GROUP BY APN, property_case_detail_id
      HAVING 
      COUNT(IF(status='Resolved Date', 1, NULL)) = 0
    ) as open_cases
  JOIN property AS p
  ON p.parcel_number = open_cases.APN
  LIMIT 0, 1000;

mysql> show processlist;
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
| Id    | User        | Host      | db           | Command | Time | State                | Info                                                                                                 |
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
| 21120 | headsupcity | localhost | lead_housing | Query   |   21 | Copying to tmp table | SELECT
          SQL_CALC_FOUND_ROWS
          DISTINCT p.parcel_number,
          p.street_numbe |
| 21121 | headsupcity | localhost | lead_housing | Query   |    0 | NULL                 | show processlist                                                                                     |
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

Explains are different on my local machine and on the server, and I'm assuming the only reason my query runs at all on my local machine, is because of the key that is automatically created on the derived table:

Explain (local):

+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+
| id | select_type | table      | partitions | type | possible_keys | key         | key_len | ref                          | rows    | filtered | Extra                           |
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+
|  1 | PRIMARY     | p          | NULL       | ALL  | NULL          | NULL        | NULL    | NULL                         |   40319 |   100.00 | Using temporary                 |
|  1 | PRIMARY     | <derived2> | NULL       | ref  | <auto_key0>   | <auto_key0> | 8       | lead_housing.p.parcel_number |      40 |   100.00 | NULL                            |
|  2 | DERIVED     | pi         | NULL       | ALL  | NULL          | NULL        | NULL    | NULL                         | 1623978 |   100.00 | Using temporary; Using filesort |
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+

Explain (server):

+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows    | Extra                                    |
+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+
|  1 | PRIMARY     | p          | ALL  | NULL          | NULL | NULL    | NULL |   41369 | Using temporary                          |
|  1 | PRIMARY     | <derived2> | ALL  | NULL          | NULL | NULL    | NULL |  122948 | Using where; Distinct; Using join buffer |
|  2 | DERIVED     | pi         | ALL  | NULL          | NULL | NULL    | NULL | 1718586 | Using temporary; Using filesort          |
+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+

Schemas:

mysql> explain property_inspection;
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
| Field                   | Type         | Null | Key | Default           | Extra                       |
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
| id                      | int(11)      | NO   | PRI | NULL              | auto_increment              |
| lblCaseNo               | int(11)      | NO   | MUL | NULL              |                             |
| APN                     | bigint(10)   | NO   | MUL | NULL              |                             |
| date                    | varchar(50)  | NO   |     | NULL              |                             |
| status                  | varchar(500) | NO   |     | NULL              |                             |
| property_case_detail_id | int(11)      | YES  | MUL | NULL              |                             |
| case_type_id            | int(11)      | YES  | MUL | NULL              |                             |
| date_modified           | timestamp    | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| update_status           | tinyint(1)   | YES  |     | 1                 |                             |
| created_date            | datetime     | NO   |     | NULL              |                             |
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
10 rows in set (0.02 sec)

mysql> explain property; (not all columns, but you get the gist)
+----------------------------+--------------+------+-----+-------------------+-----------------------------+
| Field                      | Type         | Null | Key | Default           | Extra                       |
+----------------------------+--------------+------+-----+-------------------+-----------------------------+
| id                         | int(11)      | NO   | PRI | NULL              | auto_increment              |
| parcel_number              | bigint(10)   | NO   |     | 0                  |                             |
| date_modified              | timestamp    | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| created_date               | datetime     | NO   |     | NULL              |                             |
+----------------------------+--------------+------+-----+-------------------+-----------------------------+

Variables that might be relevant:

tmp_table_size: 16777216
innodb_buffer_pool_size: 8589934592

Any ideas on how to optimize this, and any idea why the explains are so different?


Solution

  • Since this is where the Optimizers are quite different, let's try to optimize

    SELECT APN, property_case_detail_id FROM property_inspection AS pi
      GROUP BY APN, property_case_detail_id
      HAVING 
      COUNT(IF(status='Resolved Date', 1, NULL)) = 0
    ) as open_cases
    

    Give this a try:

    SELECT ...
        FROM property AS p
        WHERE NOT EXISTS ( SELECT 1 FROM property_inspection
                     WHERE status = 'Resolved Date'
                       AND p.parcel_number = APN )
        ORDER BY ???  -- without this, the `LIMIT` is unpredictable
        LIMIT 0, 1000;
    

    or...

    SELECT ...
        FROM property AS p
        LEFT JOIN  property_inspection AS pi  ON p.parcel_number = pi.APN
        WHERE pi.status = 'Resolved Date'
          AND pi.APN IS NULL
        ORDER BY ???  -- without this, the `LIMIT` is unpredictable
        LIMIT 0, 1000;
    

    Index:

    property_inspection:  INDEX(status, parcel_number) -- in either order