Search code examples
mysqlquery-optimization

Slow LIMIT Queries on MySQL


Summary: I'm trying to export data from a large partitioned MySQL table, but the LIMIT queries are taking a long time.

Hi, I have a large MySQL database (AWS Aurora 5.7). Total size about 5TB. One table is 2.5TB. Here is the table structure

column type
partition int(10) unsigned primary key
user_id binary(16) primary key
group_id binary(16) primary key
hash binary(16) primary key
value json

For each day there is a new partition created. It is a range partition. There are about 700 partitions now.

I'm trying to export the entire table. Running queries 1 partition at a time using LIMIT queries like this:

SELECT HEX(`hash`), `value`
FROM my_table
WHERE partition=... AND user_id=... AND  group_id=...
LIMIT 200000, 100000

Such queries take a range of 1-3 minutes each. Once I get the result, I run the next query increasing the "LIMIT start index". The first query with "LIMIT 0" is faster. The bigger "LIMIT" is, the slower it takes.

For example if I query only 1000 rows, without any "WHERE" conditions, here are the results:

LIMIT       0, 1000 -> 0.000sec
LIMIT    1000, 1000 -> 0.015sec
LIMIT   10000, 1000 -> 0.094sec
LIMIT  100000, 1000 -> 0.400sec
LIMIT 1000000, 1000 -> 5.781sec

How do I make the queries faster?


Solution

  • Large OFFSET values are a challenge for a database that indexes by value, not by row number or position. I answered a similar question here: MySQL - Slow performance for limit offset on json column

    However, the solution in that question won't work for you, because you haven't specified an order for the rows returned by your query. I can guess that the rows in your selected set have no meaningful order, because the only column of the PK you aren't filtering by is a hash.

    I don't know what to suggest for you, except to increase the CPU and RAM of your database instance, to hopefully hold more data in RAM and make access faster.

    Another possibility is to create a new column for subpartitioning, and use that instead of LIMIT and OFFSET.

    There aren't any good logical fixes.