Search code examples
mysqlquery-optimizationpartitioning

What is the best way to speed up collecting query from huge log table?


I have MySQL database log table that increasing daily with 5m of data and I have issue with collecting data from that table to make some analyze counts.

I have listed the details of the problem as follows:

This is my log table:

CREATE TABLE `details` (
    `id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
    `provider` VARCHAR(25) NULL COLLATE 'utf8mb4_unicode_ci',
    `DLR_Status` VARCHAR(30) NULL COLLATE 'utf8mb4_unicode_ci',
    `source` VARCHAR(30) NULL COLLATE 'utf8mb4_bin',
    `Destination` VARCHAR(30) NULL COLLATE 'utf8mb4_unicode_ci',
    `msg` VARCHAR(1000) NULL COLLATE 'utf8mb4_unicode_ci',
    `timestamp` TIMESTAMP NULL,
    `msg_timestamp` INT NOT NULL,
    `created_at` TIMESTAMP NULL,
    `updated_at` TIMESTAMP NULL,
    PRIMARY KEY (`id`, `msg_timestamp`) USING BTREE
)
COLLATE='utf8mb4_unicode_ci'
AUTO_INCREMENT=24169513
/*!50100 PARTITION BY RANGE (`msg_timestamp`)
(PARTITION p2016_02 VALUES LESS THAN (1456779600) ENGINE = InnoDB,
 PARTITION p2016_03 VALUES LESS THAN (1459458000) ENGINE = InnoDB,
 PARTITION p2016_04 VALUES LESS THAN (1462050000) ENGINE = InnoDB,
 PARTITION p2016_05 VALUES LESS THAN (1464728400) ENGINE = InnoDB,
 PARTITION p2016_06 VALUES LESS THAN (1467320400) ENGINE = InnoDB,
 PARTITION p2016_07 VALUES LESS THAN (1469998800) ENGINE = InnoDB,
 PARTITION p2016_08 VALUES LESS THAN (1472677200) ENGINE = InnoDB,
 PARTITION p2016_09 VALUES LESS THAN (1475269200) ENGINE = InnoDB,
 PARTITION p2016_10 VALUES LESS THAN (1477947600) ENGINE = InnoDB,
 PARTITION p2016_11 VALUES LESS THAN (1480539600) ENGINE = InnoDB,
 PARTITION p2016_12 VALUES LESS THAN (1483218000) ENGINE = InnoDB,
 PARTITION p2017_01 VALUES LESS THAN (1485896400) ENGINE = InnoDB,
 PARTITION p2017_02 VALUES LESS THAN (1488315600) ENGINE = InnoDB,
 PARTITION p2017_03 VALUES LESS THAN (1490994000) ENGINE = InnoDB,
 PARTITION p2017_04 VALUES LESS THAN (1493586000) ENGINE = InnoDB,
 PARTITION p2017_05 VALUES LESS THAN (1496264400) ENGINE = InnoDB,
 PARTITION p2017_06 VALUES LESS THAN (1498856400) ENGINE = InnoDB,
 PARTITION p2017_07 VALUES LESS THAN (1501534800) ENGINE = InnoDB,
 PARTITION p2017_08 VALUES LESS THAN (1504213200) ENGINE = InnoDB,
 PARTITION p2017_09 VALUES LESS THAN (1506805200) ENGINE = InnoDB,
 PARTITION p2017_10 VALUES LESS THAN (1509483600) ENGINE = InnoDB,
 PARTITION p2017_11 VALUES LESS THAN (1512075600) ENGINE = InnoDB,
 PARTITION p2017_12 VALUES LESS THAN (1514754000) ENGINE = InnoDB,
 PARTITION p2018_01 VALUES LESS THAN (1517432400) ENGINE = InnoDB,
 PARTITION p2018_02 VALUES LESS THAN (1519851600) ENGINE = InnoDB,
 PARTITION p2018_03 VALUES LESS THAN (1522530000) ENGINE = InnoDB,
 PARTITION p2018_04 VALUES LESS THAN (1525122000) ENGINE = InnoDB,
 PARTITION p2018_05 VALUES LESS THAN (1527800400) ENGINE = InnoDB,
 PARTITION p2018_06 VALUES LESS THAN (1530392400) ENGINE = InnoDB,
 PARTITION p2018_07 VALUES LESS THAN (1533070800) ENGINE = InnoDB,
 PARTITION p2018_08 VALUES LESS THAN (1535749200) ENGINE = InnoDB,
 PARTITION p2018_09 VALUES LESS THAN (1538341200) ENGINE = InnoDB,
 PARTITION p2018_10 VALUES LESS THAN (1541019600) ENGINE = InnoDB,
 PARTITION p2018_11 VALUES LESS THAN (1543611600) ENGINE = InnoDB,
 PARTITION p2018_12 VALUES LESS THAN (1546290000) ENGINE = InnoDB,
 PARTITION p2019_01 VALUES LESS THAN (1548968400) ENGINE = InnoDB,
 PARTITION p2019_02 VALUES LESS THAN (1551387600) ENGINE = InnoDB,
 PARTITION p2019_03 VALUES LESS THAN (1554066000) ENGINE = InnoDB,
 PARTITION p2019_04 VALUES LESS THAN (1556658000) ENGINE = InnoDB,
 PARTITION p2019_05 VALUES LESS THAN (1559336400) ENGINE = InnoDB,
 PARTITION p2019_06 VALUES LESS THAN (1561928400) ENGINE = InnoDB,
 PARTITION p2019_07 VALUES LESS THAN (1564606800) ENGINE = InnoDB,
 PARTITION p2019_08 VALUES LESS THAN (1567285200) ENGINE = InnoDB,
 PARTITION p2019_09 VALUES LESS THAN (1569877200) ENGINE = InnoDB,
 PARTITION p2019_10 VALUES LESS THAN (1572555600) ENGINE = InnoDB,
 PARTITION p2019_11 VALUES LESS THAN (1575147600) ENGINE = InnoDB,
 PARTITION p2019_12 VALUES LESS THAN (1577826000) ENGINE = InnoDB,
 PARTITION p2020_01 VALUES LESS THAN (1580504400) ENGINE = InnoDB,
 PARTITION p2020_02 VALUES LESS THAN (1583010000) ENGINE = InnoDB,
 PARTITION p2020_03 VALUES LESS THAN (1585688400) ENGINE = InnoDB,
 PARTITION p2020_04 VALUES LESS THAN (1588280400) ENGINE = InnoDB,
 PARTITION p2020_05 VALUES LESS THAN (1590958800) ENGINE = InnoDB,
 PARTITION p2020_06 VALUES LESS THAN (1593550800) ENGINE = InnoDB,
 PARTITION p2020_07 VALUES LESS THAN (1596229200) ENGINE = InnoDB,
 PARTITION p2020_08 VALUES LESS THAN (1598907600) ENGINE = InnoDB,
 PARTITION p2020_09 VALUES LESS THAN (1601499600) ENGINE = InnoDB,
 PARTITION p2020_10 VALUES LESS THAN (1604178000) ENGINE = InnoDB,
 PARTITION p2020_11 VALUES LESS THAN (1606770000) ENGINE = InnoDB,
 PARTITION p2020_12 VALUES LESS THAN (1609448400) ENGINE = InnoDB,
 PARTITION p2021_01 VALUES LESS THAN (1612126800) ENGINE = InnoDB,
 PARTITION p2021_02 VALUES LESS THAN (1614546000) ENGINE = InnoDB,
 PARTITION p2021_03 VALUES LESS THAN (1617224400) ENGINE = InnoDB,
 PARTITION p2021_04 VALUES LESS THAN (1619816400) ENGINE = InnoDB,
 PARTITION p2021_05 VALUES LESS THAN (1622494800) ENGINE = InnoDB,
 PARTITION p2021_06 VALUES LESS THAN (1625086800) ENGINE = InnoDB,
 PARTITION p2021_07 VALUES LESS THAN (1627765200) ENGINE = InnoDB,
 PARTITION p2021_08 VALUES LESS THAN (1630443600) ENGINE = InnoDB,
 PARTITION p2021_09 VALUES LESS THAN (1633035600) ENGINE = InnoDB,
 PARTITION p2021_10 VALUES LESS THAN (1635714000) ENGINE = InnoDB,
 PARTITION p2021_11 VALUES LESS THAN (1638306000) ENGINE = InnoDB,
 PARTITION p2021_12 VALUES LESS THAN (1640984400) ENGINE = InnoDB,
 PARTITION p2022_01 VALUES LESS THAN (1643662800) ENGINE = InnoDB,
 PARTITION p2022_02 VALUES LESS THAN (1646082000) ENGINE = InnoDB,
 PARTITION p2022_03 VALUES LESS THAN (1648760400) ENGINE = InnoDB,
 PARTITION p2022_04 VALUES LESS THAN (1651352400) ENGINE = InnoDB,
 PARTITION p2022_05 VALUES LESS THAN (1654030800) ENGINE = InnoDB,
 PARTITION p2022_06 VALUES LESS THAN (1656622800) ENGINE = InnoDB,
 PARTITION p2022_07 VALUES LESS THAN (1659301200) ENGINE = InnoDB,
 PARTITION p2022_08 VALUES LESS THAN (1661979600) ENGINE = InnoDB,
 PARTITION p2022_09 VALUES LESS THAN (1664571600) ENGINE = InnoDB,
 PARTITION p2022_10 VALUES LESS THAN (1667250000) ENGINE = InnoDB,
 PARTITION p2022_11 VALUES LESS THAN (1669842000) ENGINE = InnoDB,
 PARTITION p2022_12 VALUES LESS THAN MAXVALUE ENGINE = InnoDB)  */;

It's contains log data as follows:

id provider DLR_Status source Destination msg timestamp msg_timestamp
1 KDD done website 01332456 free delivery 2019-12-01 12:00:13 1575201613
2 KDD done By phone 01322422 with cost 300 2019-12-01 12:00:37 1575201637
. . . . . . . .
. . . . . . . .

The issue is that when i select some counts from this table like

SELECT SQL_CALC_FOUND_ROWS DLR_status,count(*) as c 
FROM sms_details 
group by DLR_status;

It's take very long time to give the results and some query give 504 Gateway Time-out error like this query

SELECT SQL_CALC_FOUND_ROWS Destination,count(*) as c 
FROM sms_details 
WHERE msg_timestamp >= UNIX_TIMESTAMP("2019-10-01") and msg_timestamp < UNIX_TIMESTAMP("2019-12-01") group by Destination;

I am already use partitioning in my table and I tried to indexing some column but that make big problem with daily increasing data.

So what is the best practices for the following:

  • Speed up execution time
  • Caring about an insert speed

Solution

  • Shrink schema -- Smaller --> less I/O --> faster.

    timestamp vs msg_timestamp -- These seem to be the same, just in different formats. Of so, toss one of them.

    Normalization speeds up the inserts some -- by shrinking the amount of data. Most of the VARCHARs could be replaced by a 2-byte SMALLINT UNSIGNED or a 3-byte MEDIUMINT UNSIGNED`.

    Future partitions -- Do not have more than one such; the SELECTs will waste time opening them to find nothing.

    Too many partitions -- At some limit (maybe 50), having lots of partitions slows things down.

    Batching is the best speedup. See LOAD DATA or INSERT ... VALUES (...), (...), .... In the latter case, I recommend batches of 1000 rows. (Going beyond that is getting into diminishing returns and possibly some limits.) If the data is coming from multiple sources, explain; then we can talk further.

    Partitioning is very useful for purging 'old' data because DROP PARTITION is a lot faster than DELETE. See http://mysql.rjweb.org/doc.php/partitionmaint

    Toss created_at and updated_at; they are probably useless. (Again, smaller is faster.)

    SQL_CALC_FOUND_ROWS is not needed when you don't have LIMIT; simply observe how many rows are returned. Rethink the user requirement for such. (Come back for more discussion, if desired.)

    DLR_status counts will be a full index scan if you have INDEX(DLR_status). And consider making that column an ENUM so that it is only 1 byte. (If there are several values and/or a growing number of values, then "normalize".)

    Query 2 needs INDEX(Destination, msg_timestmap).

    Is it big? 24M rows / 5 years --> less than 1 row per second. 100 rows/sec is where I start to worry about "high-speed ingestion". That is, I don't see a problem with inserts. Selects, on the other hand, may be a problem. You showed us two; let's see more. I don't want to recommend one index at a time; I would rather design a set of indexes to optimally handle all the likely queries. Especially since it may involve redesigning the partitioning.

    Summary tables are an excellent way to do fast analysis in a "Data warehouse". See http://mysql.rjweb.org/doc.php/summarytables