Summary: I'm trying to export data from a large partitioned MySQL table, but the LIMIT queries are taking a long time.
Hi, I have a large MySQL database (AWS Aurora 5.7). Total size about 5TB. One table is 2.5TB. Here is the table structure
column | type | |
---|---|---|
partition | int(10) unsigned | primary key |
user_id | binary(16) | primary key |
group_id | binary(16) | primary key |
hash | binary(16) | primary key |
value | json |
For each day there is a new partition created. It is a range partition. There are about 700 partitions now.
I'm trying to export the entire table. Running queries 1 partition at a time using LIMIT queries like this:
SELECT HEX(`hash`), `value`
FROM my_table
WHERE partition=... AND user_id=... AND group_id=...
LIMIT 200000, 100000
Such queries take a range of 1-3 minutes each. Once I get the result, I run the next query increasing the "LIMIT start index". The first query with "LIMIT 0" is faster. The bigger "LIMIT" is, the slower it takes.
For example if I query only 1000 rows, without any "WHERE" conditions, here are the results:
LIMIT 0, 1000 -> 0.000sec
LIMIT 1000, 1000 -> 0.015sec
LIMIT 10000, 1000 -> 0.094sec
LIMIT 100000, 1000 -> 0.400sec
LIMIT 1000000, 1000 -> 5.781sec
How do I make the queries faster?
Large OFFSET
values are a challenge for a database that indexes by value, not by row number or position. I answered a similar question here: MySQL - Slow performance for limit offset on json column
However, the solution in that question won't work for you, because you haven't specified an order for the rows returned by your query. I can guess that the rows in your selected set have no meaningful order, because the only column of the PK you aren't filtering by is a hash.
I don't know what to suggest for you, except to increase the CPU and RAM of your database instance, to hopefully hold more data in RAM and make access faster.
Another possibility is to create a new column for subpartitioning, and use that instead of LIMIT and OFFSET.
There aren't any good logical fixes.