Search code examples
mysqloptimizationindexingsizeinnodb

Need inputs to optimize Big size Mysql - Innodb Table


Number of records in table_abc: 236,364,206

Yearly Records:

Year    Number of records
2009    1
2011    4
2012    8
2013    25
2014    70
2015    175
2016    393
2017    1010
2018    332086
2019    7605438
2020    32268473
2021    73318801
2022    100285147
2023    22554330

Index Information

Index Information

Problems:

  1. How I can optimize this table on production server.

  2. There are unused indexes on this table, which I found from performance_schema.table_io_waits_summary_by_index_usage, where count_star = 0, but I can not directly drop the indexes from production server, as there may be processes which runs monthly or yearly, which may use some of this unused indexes from this table.

  3. There is below query, which is using above table and taking approx 7 minutes to execute

EXPLAIN SELECT o.*  FROM table_abc o 
WHERE o.cId = '3' AND (o.created <= '2023-04-27 11:45:39' OR o.created IS NULL) AND booking IS NOT NULL AND MOD((o.accId - '19'), 21) = 0 AND o.exported IS NULL 

Explain OP

Explain OP

Solutions I am thinking to implement is as follows:

1.

  • Need to archive backdated data and need to keep some limited time frame data: As I am new to the application not sure about this.
  • Need to create partitions on this table, monthly/yearly
  1. What are the prechecks we need to do or we need to take care, to remove the unused indexes from the production server. I have prepared the list of unused indexes using performance schema when the DB uptime was 1 day, and after 35 days again I took the list of unused indexes, so like this I covered monthly process, but for yearly one either I need to wait or need to check code to find out which tables are in use for yearly processes. Which is time consuming and again there will be a risk if I miss some processes.

  2. I tried to limit the time frame, there is difference in number of rows, but not much difference in execution time.

EXPLAIN SELECT o.*  FROM table_abc o 
WHERE o.cId = '3' AND ((**o.created >= '2022-04-27 11:45:39'** AND o.created <= '2023-04-27 11:45:39') OR o.created IS NULL) AND booking IS NOT NULL AND MOD((o.accId - '19'), 21) = 0 AND o.exported IS NULL 

Explain OP

Explain OP

Show create table OP:

CREATE TABLE `table_abc` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `externalNIId` varchar(40) DEFAULT NULL,
  `cId` int(11) unsigned NOT NULL,
  `accId` int(11) unsigned NOT NULL,
  `scheId` int(111) DEFAULT NULL,
  `archiveId` int(11) unsigned DEFAULT NULL,
  `booking` int(11) unsigned DEFAULT NULL,
  `created` datetime NOT NULL,
  `imported` datetime DEFAULT NULL,
  `accounted` date DEFAULT NULL,
  `payable` date DEFAULT NULL,
  `chargedDepositMonth` date DEFAULT NULL,
  `newpayable` date DEFAULT NULL,
  `type` enum('depo','invo','fee','return_fee','closing','bon','oth','late_int','dunfee_lawyer','liab_depo','liabinvo','liabi_mmma','liab_dunfee','instabonu','addon','emobi','fairness_bonu','cust','msb_bon','commendation_bon','addi_pay','add_fin_pay','derecog_debi','derecog_credi','desir_invo_fee','fle_bonu','fee_in','disconnectfee','reconnectfee','feed_in_remune','ret_deliv_es','cred_one_time_pay','ene_cos_aid_am','depo_sew','depo_wat','invo_sew','invo_wat') NOT NULL,
  `ene` enum('elec','g','wa','sew','he','pre_wat') DEFAULT NULL,
  `groAm` decimal(16,2) NOT NULL,
  `setAm` decimal(16,2) NOT NULL,
  `tax` decimal(5,2) NOT NULL,
  `noti` varchar(255) DEFAULT NULL,
  `wriOffRea` char(2) DEFAULT NULL,
  `datevExpo` datetime DEFAULT NULL,
  `firstDu` datetime DEFAULT NULL,
  `secondDu` datetime DEFAULT NULL,
  `thirdDu` datetime DEFAULT NULL,
  `isDe` tinyint(1) NOT NULL DEFAULT '0',
  `isPayblk` tinyint(1) unsigned NOT NULL DEFAULT '0',
  `payBlk` date DEFAULT NULL,
  `exported` datetime DEFAULT NULL,
  `exportArch` int(11) DEFAULT NULL,
  `invoid` int(11) NOT NULL DEFAULT '0',
  `moddate` datetime DEFAULT NULL ,
  `extConid` varchar(15) DEFAULT NULL,
  `exteyid` varchar(50) DEFAULT NULL,
  `mg_wa` varchar(6) DEFAULT NULL,
  `extalMetParNu` bigint(20) DEFAULT NULL,
  `valAdjAmount` decimal(12,2) DEFAULT NULL ,
  `txKey` varchar(5) DEFAULT NULL,
  `txTyp` varchar(40) DEFAULT NULL,
  `bokat` date NOT NULL,
  `isFulset` tinyint(1) DEFAULT NULL,
  `isLoc` tinyint(1) unsigned NOT NULL DEFAULT '0' ,
  PRIMARY KEY (`id`),
  UNIQUE KEY `bookingForClient` (`cId`,`booking`),
  KEY `cId` (`cId`),
  KEY `accountId` (`accountId`),
  KEY `scheId` (`scheduleId`),
  KEY `created` (`created`),
  KEY `type` (`type`),
  KEY `exported` (`exported`),
  KEY `client_noti` (`cId`,`noti`),
  KEY `accounted` (`accounted`),
  KEY `moddate` (`moddate`),
  KEY `cId_invoid` (`cId`,`invoid`),
  KEY `invoid_idx` (`invoid`),
  KEY `extConid` (`extConid`),
  KEY `exteyid` (`exteyid`),
  KEY `externalNIId` (`externalNIId`),
  KEY `mg_wa` (`mg_wa`),
  KEY `extalMetParNu` (`extalMetParNu`),
  KEY `account_type` (`accountId`,`type`),
  KEY `groAm` (`groAm`),
  KEY `payable` (`payable`),
  KEY `bokat` (`bokat`),
  KEY `account_receivable_idx_accountid_groAm` (`accountId`,`groAm`),
  KEY `account_receivable_idx_cId_created_exported` (`cId`,`created`,`exported`),
  KEY `isFulset` (`isFulset`),
  KEY `account_receivable_idx_invoid_groAm` (`invoid`,`groAm`),
  KEY `boing` (`booking`),
  KEY `book` (`booking`),
  KEY `account_receivables_idx_cId_isFulset_payable` (`cId`,`isFulset`,`payable`),
  CONSTRAINT `fk_a_acc` FOREIGN KEY (`accountId`) REFERENCES `acco` (`id`),
  CONSTRAINT `fk_a_cli` FOREIGN KEY (`cId`) REFERENCES `clie` (`id`),
  CONSTRAINT `fk_a_sche` FOREIGN KEY (`sched`) REFERENCES `accr` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=254601015 DEFAULT CHARSET=utf8

NOTE: I have changed the column names before uploading here.


Solution

  • "Unused indexes" are really used -- OK, leave them in place.

    "Optimize" the table -- First, let's see SHOW CREATE TABLE and the more important queries. With 236M rows, lots of things need to be looked at.

    An urgent thing to check on is whether you have any INT AUTO_INCREMENT that is likely to overflow 2 (or 4) billion soon. The SHOW will give you a clue.

    This query...

    SELECT  o.*
        FROM  table_abc o
        WHERE  o.Id = '3'
          AND  (o.created <= '2023-04-27 11:45:39'
                  OR  o.created IS NULL
               )
          AND  booking IS NOT NULL
          AND  MOD((o.accountId - '19'), 21) = 0
          AND  o.exported IS NULL 
    

    Notes on that query:

    • OR is inefficient. Do you really have some abc's that have never been "created"?
    • The MOD cuts down on the number of rows delivered, but it does not otherwise speed up the query much.
    • If abc had PRIMARY KEY(Id) then there is only one row. So I am confused as to why it is slow. (Again, I need to see the SHOW.)
    • If '2023-04-27 11:45:39' is NOW(), imply use NOW(). (No performance improvement, just simplicity.) Also, is it possible that some rows were created in the future??
    • "Using intersect" almost always begs for a composite index: INDEX(exported, Id)
    • What version of MySQL? Newer ones allow many index operations to not impact performance.
    • Do not rush into PARTITIONing. It is unlikely to provide any performance impact. (I may change my tune after I see the important queries.)

    auto_inc about to overflow

    First, check whether the 236M is the max id currently. I see from the end of the CREATE TABLE shows 254M, so there have been only a few lost values. This happens with DELETE, INSERT IGNORE, and many other ways of "burning ids". The table is 1/16th the way to the max for INT UNSIGNED (4 billion). Your growth rate was tripping every year, but slowed down during Covid. If it speeds up again, you will hit the max in less than 3 years. If it levels out, you might last more than 10 years.

    The simplest solution is to change id to BIGINT in this table and in any other table referencing it. But the downtime to do that will be painful -- the ALTER will take a long time, regardless of what version of MySQL you are running.

    Switching to a different PK is an alternative, but it is more work and not straightforward. However, you have already identified a candidate:

    UNIQUE KEY `bookingForClient` (`cId`,`booking`)
    

    I hope that neither of those INTs are growing large.

    A side effect of changing the PK: Note that each secondary index has a copy of the PK in its BTree. You have a huge number (28) of secondary indexes, each of which currently has the 4-byte id tacked on. That adds up to maybe 40B disk space overhead. That is, switching id to BIGINT or switching to PK(cId,booking) will add another 40GB of disk space.

    Getting rid of id and promoting to PRIMARY KEY(cId,booking) would require any other tables that refer to this table to use a different JOIN..ON -- this adds coding effort on your part. There is a bonus -- all indexes starting with cId (about 5) can be dropped. Some queries will run faster, some slower. Probably the overall performance change is for the better.

    Indexes to get rid of

    It is relatively safe to drop INDEX(a) when you also have INDEX(a,b). I see at least 3 cases: accountId, cId, invoid. The longer index will take over where you need the shorter id. Ditto for dropping INDEX(a,b) when you have INDEX(a,b,c...), though I don't see such a case.

    Also, I see one dup index -- booking

    INDEX(flag) is rarely useful. Here I am assuming flag is a column with two (or only a few) distinct values. Such an index will not be used for any 'common' value, but may be useful for a rare (<20%) value.

    In contrast, INDEX(flag, otherCol) if often useful, regardless of which flag value you are searching for (assuming the WHERE clause has both columns).

    Vertical split?

    40 columns is somewhat large, but I don't know the app, so I cannot say whether splitting them into two or more tables would be beneficial. 14 Date/Datetime columns seems like a lot, too.

    Datatype Sizes

    Some tables I see have BIGINT for all numeric fields; this is a gross waste of most of 8 bytes each. You have INT, TINYINT, and ENUM. So, unless some of those INTs (4 bytes) can be shrunk to MEDIUMINT or SMALLINT, I don't have much to say. (Note: a one-byte shrinkage equates to 236MB, more if the column is indexed.)

    MySQL 5.7

    Most of the ALTER TABLE ADD/DROP INDEX improvements are in your version; I don't see an urgent need to upgrade to 8.0 for that. However, the upgrade will be time-consuming (not nearly as easy as previous upgrades).

    "Index merge intersect" (not 'union') -- as already pointed out, build a composite index, then drop the index containing the left column(s) of the new index.

    The 7-minute query

    I see no rational reason why WHERE o.Id = '3' AND... did not use the PK for lookup; that should have taken about 1ms, not 420000ms. ANALYZE TABLE is fast and may change the statistics such that it will be fast.

    Hmmm... o.created IS NULL, yet created datetime NOT NULL -- Remove the unnecessary OR.

    WHERE PK = constant

    When I see a query that looks up a single row by the PK, I wonder if there was a preceding query that found that id. In general, combining those two queries is possible and will run upwards of twice as fast.

    slowlog

    Analyzing the SlowLog is a quick way to discover which queries are the "worst". Fixing the worst would help with performance. Seeing them would help with question about Partitioning.