Search code examples
mysqlsqlperformancequery-performance

MySql query 1000x slower with slighty higher LIMIT (not offset)


I have a MySQL database with about 12Mio records. Now I use the following query to query the required rows from that database:

SELECT date_time, price_l0, amount_l0, price_l1, amount_l1, price_l2, amount_l2, price_l3,  /* 34 more columns */
FROM book_states
WHERE date_time > ? and
      date_time < ? and
      bookID = ?
ORDER BY date_time ASC
LIMIT 4350

The problem is when I use a LIMIT of about 4340 this query takes about 0.002/0.15 seconds to run. However, if I use a limit of say 4350 it takes 3.0/0.15 seconds (!) to run.

If I select fewer columns that threshold between a very fast and a very slow query is slighty higher, but it takes 3seconds or more even if I select only one columns if the LIMIT is above 5000.

Now I suspect this is an MySQL setup problem or some sort of RAM limitation, but since I am not a MySQL expert by any means, I'm asking you to explain what causes this drastic performance issue.

EDIT: This is the JSON Explain data for a query that is taking 3sec

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "282333.60"
    },
    "ordering_operation": {
      "using_filesort": true,
      "table": {
        "table_name": "book_states",
        "access_type": "ref",
        "possible_keys": [
          "index1",
          "index2",
          "index3"
        ],
        "key": "index2",
        "used_key_parts": [
          "bookID"
        ],
        "key_length": "2",
        "ref": [
          "const"
        ],
        "rows_examined_per_scan": 235278,
        "rows_produced_per_join": 81679,
        "filtered": "34.72",
        "index_condition": "(`datastore`.`book_states`.`bookID` <=> 29)",
        "cost_info": {
          "read_cost": "235278.00",
          "eval_cost": "16335.84",
          "prefix_cost": "282333.60",
          "data_read_per_join": "14M"
        },
        "used_columns": [
          "id",
          "date_time",
          "bookID"
        ],
        "attached_condition": "((`datastore`.`book_states`.`date_time` > '2018-09-28T16:18:49') and (`datastore`.`book_states`.`date_time` < '2018-09-29T23:18:49'))"
      }
    }
  }
}

Solution

  • The best index for your query is on: (bookID, date_time). Note the order of the columns, it is quite important.

    MySQL is struggling to optimize your query with the indexes on-hand. It can select the records, using the date_time part of your mentioned index (or using an index on bookId) and then sort the results.

    Or, it can scan your compound index (which has records ordered by date/time), filtering out the unneeded books as it goes.

    Choosing between these two methods is what you are (presumably) seeing. Which is better depends on the gathered statistics, and they necessarily provide only partial information.

    So, switch the columns in the index and the problem should go away, at least for this particular query.