I have a simple table (created by django) - engine InnoDB:
+-------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| correlation | double | NO | | NULL | |
| gene1_id | int(10) unsigned | NO | MUL | NULL | |
| gene2_id | int(10) unsigned | NO | MUL | NULL | |
+-------------+------------------+------+-----+---------+----------------+
The table has more than 411 million rows. (The target table will have around 461M rows, 21471*21470 rows)
My main query looks like this, there might be up to 10 genes specified at most.
SELECT gene1_id, AVG(correlation) AS avg FROM genescorrelation
WHERE gene2_id IN (176829, 176519, 176230)
GROUP BY gene1_id ORDER BY NULL
This query is very slow, it takes almost 2 mins to run:
21471 rows in set (1 min 11.03 sec)
Indexes (cardinality looks strange - too small?):
Non_unique| Key_name | Seq_in_index | Column_name | Collation | Cardinality |
0 | PRIMARY | 1 | id | A | 411512194 |
1 | c_gene1_id_6b1d81605661118_fk_genes_gene_entrez | 1 | gene1_id | A | 18 |
1 | c_gene2_id_2d0044eaa6fd8c0f_fk_genes_gene_entrez | 1 | gene2_id | A | 18 |
I just run select count(*) on that table and it took 22 mins:
select count(*) from predictions_genescorrelation;
+-----------+
| count(*) |
+-----------+
| 411512002 |
+-----------+
1 row in set (22 min 45.05 sec)
What could be wrong? I suspect that mysql configuration is not set up right.
During the import of data I experienced problem with space, so that might also affected the database, although I ran check table
later - it took 2hours and stated OK.
Additionally - the cardinality of the indexes look strange. I have set up smaller database locally and there values are totally different (254945589,56528,17).
Should I redo indexes? What params should I check of MySQL? My tables are set up as InnoDB, would MyISAM make any difference?
Thanks, matali
https://www.percona.com/blog/2006/12/01/count-for-innodb-tables/
SELECT COUNT(*)
queries are very slow without WHERE
clause or without SELECT COUNT(id)
... USE INDEX (PRIMARY)
.
to speedup this:
SELECT gene1_id, AVG(correlation) AS avg FROM genescorrelation
WHERE gene2_id IN (176829, 176519, 176230)
GROUP BY gene1_id ORDER BY NULL
you should have composite key on (gene2_id, gene1_id, correlation) in that order. try
About index-cardinality: stats of Innodb tables are approximate, not accurate (sometimes insane). there even was (IS?) a bug-report https://bugs.mysql.com/bug.php?id=58382
Try to ANALIZE table and watch cardinality again