I have a table I want to partition using MySQL 5.7 Partitioning to mitigate issues I'm having with dropping old data quickly. (Also, it would be nice to have increased insert I/O performance by partitioning across something other than date, especially if I plan to shard across multiple volumes with subpartitions)
Here is a simplified version of the table:
CREATE TABLE `tbl` (
`date` date NOT NULL,
`sub_id` int(11) unsigned NOT NULL,
`cmd_id` int(11) NOT NULL,
`code` TINYINT DEFAULT NULL,
`rqst` VARCHAR(32) NOT NULL DEFAULT '',
UNIQUE KEY `uk1` (sub_id,cmd_id,date)
) ENGINE=InnoDB
(note that use of column 'date' in uk1 is only to allow partitioning on date)
(The true unique key is (sub_id,cmd_id))
Here are the SQL statements I make on that table:
1. INSERT INTO tbl (NOW(), ...)
2. UPDATE tbl SET code=$code WHERE sub_id=$sub_id AND cmd_id=$cmd_id
3. SELECT code,rqst FROM tbl WHERE sub_id=$sub_id AND cmd_id=$cmd_id
Here is the partitioning scheme I've devised so far:
PARTITION BY RANGE (TO_DAYS(date))
SUBPARTITION BY HASH(sub_id)
SUBPARTITIONS 4
(PARTITION d001 VALUES LESS THAN (736250) ENGINE = InnoDB,
PARTITION d002 VALUES LESS THAN (736260) ENGINE = InnoDB,
PARTITION d003 VALUES LESS THAN (736270) ENGINE = InnoDB,
PARTITION d004 VALUES LESS THAN (736280) ENGINE = InnoDB,
PARTITION d005 VALUES LESS THAN (736290) ENGINE = InnoDB,
PARTITION d006 VALUES LESS THAN (736300) ENGINE = InnoDB,
PARTITION d007 VALUES LESS THAN (736310) ENGINE = InnoDB,
PARTITION d008 VALUES LESS THAN (736320) ENGINE = InnoDB,
PARTITION d009 VALUES LESS THAN (736330) ENGINE = InnoDB,
PARTITION d010 VALUES LESS THAN (736340) ENGINE = InnoDB,
PARTITION d011 VALUES LESS THAN MAXVALUE ENGINE = InnoDB)
However I believe this will hurt performance by requiring a read per each partition every time i reference (sub_id,cmd_id):
EXPLAIN PARTITIONS SELECT * FROM tbl WHERE sub_id='107' AND cmd_id='2246806';
+----+-------------+-------+------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+-------------+------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+-------------+------+-------------+
| 1 | SIMPLE | optz | d001_d001sp1,d002_d002sp1,d003_d003sp1,d004_d004sp1,d005_d005sp1,d006_d006sp1,d007_d007sp1,d008_d008sp1,d009_d009sp1,d010_d010sp1,d011_d011sp1 | ref | uk1 | uk1 | 38 | const,const | 11 | Using where |
+----+-------------+-------+------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+-------------+------+-------------+
So the crux of my problem is:
Here are some notes/caveats:
date
column in the unique key but then I couldn't partition on it, so the code ensures (sub_id,cmd_id) is unique across dates as it stands.Thanks!
BY HASH
is essentially useless, as are SUBPARTITIONs
.
mitigate issues I'm having with dropping old data quickly.
That is, you need to DROP PARTITION
for old date
? Use PARTITION BY RANGE (TO_DAYS(date))
and don't bother with the subpartitioning.
For clarity, change UNIQUE KEY uk1 (sub_id,cmd_id,date)
to PRIMARY KEY (sub_id,cmd_id,date)
.
[belated edited] Your three queries will work reasonably well with such. The SELECT
and UPDATE
will have to hit all partitions since date
is not in the WHERE
clause. The INSERT
will hit only the latest partition (because of NOW()
).
More discussion, including tips on doing the periodic purging: http://mysql.rjweb.org/doc.php/partitionmaint
Only need to keep past ~month of data
Recommend about 32 partitions -- one pending DROP
, one future
; see the link.
A replication system is in place
Doing the ALTER TABLE
to add partitioning will stall the system, but I guess you understand the issue there.
I didn't want to include the date column in the unique key but then I couldn't partition on it, so the code ensures (sub_id,cmd_id) is unique across dates as it stands.
Yeah, a necessary evil.
5-20million rows/day
That's a max of a few hundred per second? If you have ingestion speed problems, see http://mysql.rjweb.org/doc.php/staging_table