Number of records in table_abc: 236,364,206
Yearly Records:
Year Number of records
2009 1
2011 4
2012 8
2013 25
2014 70
2015 175
2016 393
2017 1010
2018 332086
2019 7605438
2020 32268473
2021 73318801
2022 100285147
2023 22554330
Index Information
Problems:
How I can optimize this table on production server.
There are unused indexes on this table, which I found from performance_schema.table_io_waits_summary_by_index_usage
, where count_star = 0
, but I can not directly drop the indexes from production server, as there may be processes which runs monthly or yearly, which may use some of this unused indexes from this table.
There is below query, which is using above table and taking approx 7 minutes to execute
EXPLAIN SELECT o.* FROM table_abc o
WHERE o.cId = '3' AND (o.created <= '2023-04-27 11:45:39' OR o.created IS NULL) AND booking IS NOT NULL AND MOD((o.accId - '19'), 21) = 0 AND o.exported IS NULL
Explain OP
Solutions I am thinking to implement is as follows:
1.
What are the prechecks we need to do or we need to take care, to remove the unused indexes from the production server. I have prepared the list of unused indexes using performance schema when the DB uptime was 1 day, and after 35 days again I took the list of unused indexes, so like this I covered monthly process, but for yearly one either I need to wait or need to check code to find out which tables are in use for yearly processes. Which is time consuming and again there will be a risk if I miss some processes.
I tried to limit the time frame, there is difference in number of rows, but not much difference in execution time.
EXPLAIN SELECT o.* FROM table_abc o
WHERE o.cId = '3' AND ((**o.created >= '2022-04-27 11:45:39'** AND o.created <= '2023-04-27 11:45:39') OR o.created IS NULL) AND booking IS NOT NULL AND MOD((o.accId - '19'), 21) = 0 AND o.exported IS NULL
Explain OP
Show create table OP:
CREATE TABLE `table_abc` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`externalNIId` varchar(40) DEFAULT NULL,
`cId` int(11) unsigned NOT NULL,
`accId` int(11) unsigned NOT NULL,
`scheId` int(111) DEFAULT NULL,
`archiveId` int(11) unsigned DEFAULT NULL,
`booking` int(11) unsigned DEFAULT NULL,
`created` datetime NOT NULL,
`imported` datetime DEFAULT NULL,
`accounted` date DEFAULT NULL,
`payable` date DEFAULT NULL,
`chargedDepositMonth` date DEFAULT NULL,
`newpayable` date DEFAULT NULL,
`type` enum('depo','invo','fee','return_fee','closing','bon','oth','late_int','dunfee_lawyer','liab_depo','liabinvo','liabi_mmma','liab_dunfee','instabonu','addon','emobi','fairness_bonu','cust','msb_bon','commendation_bon','addi_pay','add_fin_pay','derecog_debi','derecog_credi','desir_invo_fee','fle_bonu','fee_in','disconnectfee','reconnectfee','feed_in_remune','ret_deliv_es','cred_one_time_pay','ene_cos_aid_am','depo_sew','depo_wat','invo_sew','invo_wat') NOT NULL,
`ene` enum('elec','g','wa','sew','he','pre_wat') DEFAULT NULL,
`groAm` decimal(16,2) NOT NULL,
`setAm` decimal(16,2) NOT NULL,
`tax` decimal(5,2) NOT NULL,
`noti` varchar(255) DEFAULT NULL,
`wriOffRea` char(2) DEFAULT NULL,
`datevExpo` datetime DEFAULT NULL,
`firstDu` datetime DEFAULT NULL,
`secondDu` datetime DEFAULT NULL,
`thirdDu` datetime DEFAULT NULL,
`isDe` tinyint(1) NOT NULL DEFAULT '0',
`isPayblk` tinyint(1) unsigned NOT NULL DEFAULT '0',
`payBlk` date DEFAULT NULL,
`exported` datetime DEFAULT NULL,
`exportArch` int(11) DEFAULT NULL,
`invoid` int(11) NOT NULL DEFAULT '0',
`moddate` datetime DEFAULT NULL ,
`extConid` varchar(15) DEFAULT NULL,
`exteyid` varchar(50) DEFAULT NULL,
`mg_wa` varchar(6) DEFAULT NULL,
`extalMetParNu` bigint(20) DEFAULT NULL,
`valAdjAmount` decimal(12,2) DEFAULT NULL ,
`txKey` varchar(5) DEFAULT NULL,
`txTyp` varchar(40) DEFAULT NULL,
`bokat` date NOT NULL,
`isFulset` tinyint(1) DEFAULT NULL,
`isLoc` tinyint(1) unsigned NOT NULL DEFAULT '0' ,
PRIMARY KEY (`id`),
UNIQUE KEY `bookingForClient` (`cId`,`booking`),
KEY `cId` (`cId`),
KEY `accountId` (`accountId`),
KEY `scheId` (`scheduleId`),
KEY `created` (`created`),
KEY `type` (`type`),
KEY `exported` (`exported`),
KEY `client_noti` (`cId`,`noti`),
KEY `accounted` (`accounted`),
KEY `moddate` (`moddate`),
KEY `cId_invoid` (`cId`,`invoid`),
KEY `invoid_idx` (`invoid`),
KEY `extConid` (`extConid`),
KEY `exteyid` (`exteyid`),
KEY `externalNIId` (`externalNIId`),
KEY `mg_wa` (`mg_wa`),
KEY `extalMetParNu` (`extalMetParNu`),
KEY `account_type` (`accountId`,`type`),
KEY `groAm` (`groAm`),
KEY `payable` (`payable`),
KEY `bokat` (`bokat`),
KEY `account_receivable_idx_accountid_groAm` (`accountId`,`groAm`),
KEY `account_receivable_idx_cId_created_exported` (`cId`,`created`,`exported`),
KEY `isFulset` (`isFulset`),
KEY `account_receivable_idx_invoid_groAm` (`invoid`,`groAm`),
KEY `boing` (`booking`),
KEY `book` (`booking`),
KEY `account_receivables_idx_cId_isFulset_payable` (`cId`,`isFulset`,`payable`),
CONSTRAINT `fk_a_acc` FOREIGN KEY (`accountId`) REFERENCES `acco` (`id`),
CONSTRAINT `fk_a_cli` FOREIGN KEY (`cId`) REFERENCES `clie` (`id`),
CONSTRAINT `fk_a_sche` FOREIGN KEY (`sched`) REFERENCES `accr` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=254601015 DEFAULT CHARSET=utf8
NOTE: I have changed the column names before uploading here.
"Unused indexes" are really used -- OK, leave them in place.
"Optimize" the table -- First, let's see SHOW CREATE TABLE
and the more important queries. With 236M rows, lots of things need to be looked at.
An urgent thing to check on is whether you have any INT AUTO_INCREMENT
that is likely to overflow 2 (or 4) billion soon. The SHOW
will give you a clue.
This query...
SELECT o.*
FROM table_abc o
WHERE o.Id = '3'
AND (o.created <= '2023-04-27 11:45:39'
OR o.created IS NULL
)
AND booking IS NOT NULL
AND MOD((o.accountId - '19'), 21) = 0
AND o.exported IS NULL
Notes on that query:
OR
is inefficient. Do you really have some abc's that have never been "created"?MOD
cuts down on the number of rows delivered, but it does not otherwise speed up the query much.abc
had PRIMARY KEY(Id)
then there is only one row. So I am confused as to why it is slow. (Again, I need to see the SHOW
.)'2023-04-27 11:45:39'
is NOW()
, imply use NOW()
. (No performance improvement, just simplicity.) Also, is it possible that some rows were created in the future??INDEX(exported, Id)
PARTITIONing
. It is unlikely to provide any performance impact. (I may change my tune after I see the important queries.)auto_inc about to overflow
First, check whether the 236M is the max id currently. I see from the end of the CREATE TABLE
shows 254M, so there have been only a few lost values. This happens with DELETE
, INSERT IGNORE
, and many other ways of "burning ids". The table is 1/16th the way to the max for INT UNSIGNED
(4 billion). Your growth rate was tripping every year, but slowed down during Covid. If it speeds up again, you will hit the max in less than 3 years. If it levels out, you might last more than 10 years.
The simplest solution is to change id to BIGINT
in this table and in any other table referencing it. But the downtime to do that will be painful -- the ALTER
will take a long time, regardless of what version of MySQL you are running.
Switching to a different PK is an alternative, but it is more work and not straightforward. However, you have already identified a candidate:
UNIQUE KEY `bookingForClient` (`cId`,`booking`)
I hope that neither of those INTs are growing large.
A side effect of changing the PK: Note that each secondary index has a copy of the PK in its BTree. You have a huge number (28) of secondary indexes, each of which currently has the 4-byte id tacked on. That adds up to maybe 40B disk space overhead. That is, switching id
to BIGINT
or switching to PK(cId
,booking
) will add another 40GB of disk space.
Getting rid of id
and promoting to PRIMARY KEY(cId,booking)
would require any other tables that refer to this table to use a different JOIN..ON
-- this adds coding effort on your part. There is a bonus -- all indexes starting with cId
(about 5) can be dropped. Some queries will run faster, some slower. Probably the overall performance change is for the better.
Indexes to get rid of
It is relatively safe to drop INDEX(a)
when you also have INDEX(a,b)
. I see at least 3 cases: accountId
, cId
, invoid
. The longer index will take over where you need the shorter id. Ditto for dropping INDEX(a,b)
when you have INDEX(a,b,c...)
, though I don't see such a case.
Also, I see one dup index -- booking
INDEX(flag)
is rarely useful. Here I am assuming flag
is a column with two (or only a few) distinct values. Such an index will not be used for any 'common' value, but may be useful for a rare (<20%) value.
In contrast, INDEX(flag, otherCol)
if often useful, regardless of which flag value you are searching for (assuming the WHERE
clause has both columns).
Vertical split?
40 columns is somewhat large, but I don't know the app, so I cannot say whether splitting them into two or more tables would be beneficial. 14 Date/Datetime columns seems like a lot, too.
Datatype Sizes
Some tables I see have BIGINT
for all numeric fields; this is a gross waste of most of 8 bytes each. You have INT, TINYINT, and ENUM. So, unless some of those INTs (4 bytes) can be shrunk to MEDIUMINT
or SMALLINT
, I don't have much to say. (Note: a one-byte shrinkage equates to 236MB, more if the column is indexed.)
MySQL 5.7
Most of the ALTER TABLE ADD/DROP INDEX
improvements are in your version; I don't see an urgent need to upgrade to 8.0 for that. However, the upgrade will be time-consuming (not nearly as easy as previous upgrades).
"Index merge intersect" (not 'union') -- as already pointed out, build a composite index, then drop the index containing the left column(s) of the new index.
The 7-minute query
I see no rational reason why WHERE o.Id = '3' AND...
did not use the PK for lookup; that should have taken about 1ms, not 420000ms. ANALYZE TABLE
is fast and may change the statistics such that it will be fast.
Hmmm... o.created IS NULL
, yet created datetime NOT NULL
-- Remove the unnecessary OR
.
WHERE PK = constant
When I see a query that looks up a single row by the PK, I wonder if there was a preceding query that found that id
. In general, combining those two queries is possible and will run upwards of twice as fast.
slowlog
Analyzing the SlowLog is a quick way to discover which queries are the "worst". Fixing the worst would help with performance. Seeing them would help with question about Partitioning.