I have one table that contains documents, and on production there are about 1.2 millon records in this table. On this table when I do select count(*) from <table>
, it takes too long that at the end I need to restart the DB. On the other hand I also have many other table containing 10-12 million rows but those tables does not have this issue.
These are indexes of that table
mysql> show index from candidates_resume\G
*************************** 1. row ***************************
Table: candidates_resume
Non_unique: 0
Key_name: PRIMARY
Seq_in_index: 1
Column_name: id
Collation: A
Cardinality: 843657
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
Visible: YES
Expression: NULL
*************************** 2. row ***************************
Table: candidates_resume
Non_unique: 0
Key_name: candidate_id
Seq_in_index: 1
Column_name: candidate_id
Collation: A
Cardinality: 844009
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
Visible: YES
Expression: NULL
*************************** 3. row ***************************
Table: candidates_resume
Non_unique: 1
Key_name: candidates_resume_uploaded_on_e4c78158b8c18f_uniq
Seq_in_index: 1
Column_name: uploaded_on
Collation: A
Cardinality: 844009
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
Visible: YES
Expression: NULL
*************************** 4. row ***************************
Table: candidates_resume
Non_unique: 1
Key_name: candidates_resume_pdf_file_5b052603240d1d43_uniq
Seq_in_index: 1
Column_name: pdf_file
Collation: A
Cardinality: 844009
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:
Visible: YES
Expression: NULL
*************************** 5. row ***************************
Table: candidates_resume
Non_unique: 1
Key_name: candidates_resume_watermark_file_68fd6000f27d4f8d_uniq
Seq_in_index: 1
Column_name: watermark_file
Collation: A
Cardinality: 844009
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:
Visible: YES
Expression: NULL
And this is result of SHOW CREATE TABLE
Create Table: CREATE TABLE `candidates_resume` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(50) NOT NULL,
`uploaded_on` datetime NOT NULL,
`candidate_id` int(11) NOT NULL,
`file` varchar(100) NOT NULL,
`hash` varchar(10) NOT NULL,
`pdf_file` varchar(100) DEFAULT NULL,
`resume_text` longtext NOT NULL,
`watermark_file` varchar(100) DEFAULT NULL,
`html_file` varchar(100) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `candidate_id` (`candidate_id`),
KEY `candidates_resume_uploaded_on_e4c78158b8c18f_uniq` (`uploaded_on`),
KEY `candidates_resume_pdf_file_88ec1f31_uniq` (`pdf_file`),
KEY `candidates_resume_watermark_file_23af2d43_uniq` (`watermark_file`),
CONSTRAINT `candidate_id_refs_id_88f99c34` FOREIGN KEY (`candidate_id`) REFERENCES `candidates_candidate` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=591098 DEFAULT CHARSET=utf8
Can anyone guide me how can I catch the issue with this table ?
SELECT COUNT(*) FROM ...
without any filtering (WHERE
) must scan the entire table or an index. This takes time.
Do EXPLAIN SELECT ...
to see how it is handled. I think it will use your UNIQUE(candidate_id)
. (Please provide SHOW CREATE TABLE
.)
Assuming that candidate_id
is INT
or BIGINT
, the query can't be run much faster.
Why do you need to count the number of rows. Would an estimate be "good enough"? If so, see SHOW TABLE STATUS
or the equivalent query in information_schema
.
If the count from midnight this morning would be "good enough", then perform that and save it somewhere.
If you can't figure how to avoid timeout, see wait_timeout
. Caution; there are several flavors of it.
With a Summary Table
Build and maintain a table that keeps, say, the hourly counts of rows:
CREATE TABLE counts (
hr MEDIUMINT UNSIGNED NOT NULL,
ct SMALLINT UNSIGNED NOT NULL,
PRIMARY KEY(hr)
) ENGINE=InnoDB;
Initialize (one-time task):
INSERT INTO counts (hr, ct)
SELECT FLOOR(UNIX_TIMESTAMP(uploaded_on) / 3600),
COUNT(*)
FROM candidates_resume
GROUP BY 1;
As a new row is inserted into candidates_resume
:
INSERT INTO candidates_resume
(hr, ct)
VALUES
(FLOOR(UNIX_TIMESTAMP(uploaded_on) / 3600), 1)
ON DUPLICATE KEY UPDATE ct = ct + 1;
When wanting the count:
SELECT SUM(ct) FROM counts;
That gives the count up to the start of the current hour. If you need the count up to the current second, add on a second query to count just the rows since the start of the hour.
(There are a few loose ends to fix.)
More discussion: http://mysql.rjweb.org/doc.php/summarytables