Search code examples
mysqldatabaseindexinginnodb

MySQL count query takes too long for particular table only


I have one table that contains documents, and on production there are about 1.2 millon records in this table. On this table when I do select count(*) from <table>, it takes too long that at the end I need to restart the DB. On the other hand I also have many other table containing 10-12 million rows but those tables does not have this issue.

These are indexes of that table

mysql> show index from candidates_resume\G
*************************** 1. row ***************************
        Table: candidates_resume
   Non_unique: 0
     Key_name: PRIMARY
 Seq_in_index: 1
  Column_name: id
    Collation: A
  Cardinality: 843657
     Sub_part: NULL
       Packed: NULL
         Null:
   Index_type: BTREE
      Comment:
Index_comment:
      Visible: YES
   Expression: NULL
*************************** 2. row ***************************
        Table: candidates_resume
   Non_unique: 0
     Key_name: candidate_id
 Seq_in_index: 1
  Column_name: candidate_id
    Collation: A
  Cardinality: 844009
     Sub_part: NULL
       Packed: NULL
         Null:
   Index_type: BTREE
      Comment:
Index_comment:
      Visible: YES
   Expression: NULL
*************************** 3. row ***************************
        Table: candidates_resume
   Non_unique: 1
     Key_name: candidates_resume_uploaded_on_e4c78158b8c18f_uniq
 Seq_in_index: 1
  Column_name: uploaded_on
    Collation: A
  Cardinality: 844009
     Sub_part: NULL
       Packed: NULL
         Null:
   Index_type: BTREE
      Comment:
Index_comment:
      Visible: YES
   Expression: NULL
*************************** 4. row ***************************
        Table: candidates_resume
   Non_unique: 1
     Key_name: candidates_resume_pdf_file_5b052603240d1d43_uniq
 Seq_in_index: 1
  Column_name: pdf_file
    Collation: A
  Cardinality: 844009
     Sub_part: NULL
       Packed: NULL
         Null: YES
   Index_type: BTREE
      Comment:
Index_comment:
      Visible: YES
   Expression: NULL
*************************** 5. row ***************************
        Table: candidates_resume
   Non_unique: 1
     Key_name: candidates_resume_watermark_file_68fd6000f27d4f8d_uniq
 Seq_in_index: 1
  Column_name: watermark_file
    Collation: A
  Cardinality: 844009
     Sub_part: NULL
       Packed: NULL
         Null: YES
   Index_type: BTREE
      Comment:
Index_comment:
      Visible: YES
   Expression: NULL

And this is result of SHOW CREATE TABLE

Create Table: CREATE TABLE `candidates_resume` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `title` varchar(50) NOT NULL,
  `uploaded_on` datetime NOT NULL,
  `candidate_id` int(11) NOT NULL,
  `file` varchar(100) NOT NULL,
  `hash` varchar(10) NOT NULL,
  `pdf_file` varchar(100) DEFAULT NULL,
  `resume_text` longtext NOT NULL,
  `watermark_file` varchar(100) DEFAULT NULL,
  `html_file` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `candidate_id` (`candidate_id`),
  KEY `candidates_resume_uploaded_on_e4c78158b8c18f_uniq` (`uploaded_on`),
  KEY `candidates_resume_pdf_file_88ec1f31_uniq` (`pdf_file`),
  KEY `candidates_resume_watermark_file_23af2d43_uniq` (`watermark_file`),
  CONSTRAINT `candidate_id_refs_id_88f99c34` FOREIGN KEY (`candidate_id`) REFERENCES `candidates_candidate` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=591098 DEFAULT CHARSET=utf8

Can anyone guide me how can I catch the issue with this table ?


Solution

  • SELECT COUNT(*) FROM ... without any filtering (WHERE) must scan the entire table or an index. This takes time.

    Do EXPLAIN SELECT ... to see how it is handled. I think it will use your UNIQUE(candidate_id). (Please provide SHOW CREATE TABLE.)

    Assuming that candidate_id is INT or BIGINT, the query can't be run much faster.

    Why do you need to count the number of rows. Would an estimate be "good enough"? If so, see SHOW TABLE STATUS or the equivalent query in information_schema.

    If the count from midnight this morning would be "good enough", then perform that and save it somewhere.

    If you can't figure how to avoid timeout, see wait_timeout. Caution; there are several flavors of it.

    With a Summary Table

    Build and maintain a table that keeps, say, the hourly counts of rows:

    CREATE TABLE counts (
        hr MEDIUMINT UNSIGNED NOT NULL,
        ct SMALLINT UNSIGNED NOT NULL,
        PRIMARY KEY(hr)
    ) ENGINE=InnoDB;
    

    Initialize (one-time task):

    INSERT INTO counts  (hr, ct)
        SELECT  FLOOR(UNIX_TIMESTAMP(uploaded_on) / 3600),
                COUNT(*)
            FROM candidates_resume
            GROUP BY 1;
    

    As a new row is inserted into candidates_resume:

    INSERT INTO candidates_resume
        (hr, ct)
        VALUES
        (FLOOR(UNIX_TIMESTAMP(uploaded_on) / 3600), 1)
        ON DUPLICATE KEY UPDATE ct = ct + 1;
    

    When wanting the count:

    SELECT SUM(ct) FROM counts;
    

    That gives the count up to the start of the current hour. If you need the count up to the current second, add on a second query to count just the rows since the start of the hour.

    (There are a few loose ends to fix.)

    More discussion: http://mysql.rjweb.org/doc.php/summarytables