I have the following SQL query:
SELECT DISTINCT business_key
FROM Memory
WHERE concept <> 'case' OR attrib <> 'status' OR value <> 'closed'
What I try to achieve is to get all unique business keys that don't have a record concept=case AND attrib=status AND value=closed. Running this query in MySQL with 500 000 records with all unique business_keys is very slow: about 11 seconds.
I placed indices to the business_key column, to the concept, attrib and value columns. I also tried with a combined index to all three columns (concept, attrib, value) but the result is the same.
Here is a screenshot of the EXPLAIN EXTENDED
command:
The interesting thing is that running the query without the distinct specifier results in a very fast execution.
I had also tried this:
SELECT DISTINCT m.business_key
FROM Memory m
WHERE m.business_key NOT IN
(SELECT c.business_Key
FROM Memory c
WHERE c.concept = 'case' AND c.attrib = 'status' AND c.value = 'closed')
with even worse results: around 25 seconds
You could add a compound (concept, attrib, value, business_key)
index so the query (if MySQL decides to use this index) can find all the info in the index without having to read the whole table.
Your query is equivalent to:
SELECT DISTINCT business_key
FROM Memory
WHERE NOT (concept = 'case' AND attrib = 'status' AND value = 'closed')
and to this (which will probably yield the same execution plan):
SELECT business_key
FROM Memory
WHERE NOT (concept = 'case' AND attrib = 'status' AND value = 'closed')
GROUP BY business_key
Since the 4 columns that are to be put in the index are all VARCHAR(255)
, the index length will be pretty large. MyISAM will not allow more than 1000 bytes and InnoDB no more than 3072.
One solution is to cut the length of the last part, making the index length less than 1000: 255+255+255+230 = 995
:
(concept, attrib, value, business_key(220))
It will work but it's really not good to have so large index lengths, performance wise.
Another option is to lower the length of all or some of those 4 columns, if that complies with the data you expect to store there. No need to declare length 255
if you expect to have maximum of 100
in a column.
Another option you may consider is putting those 4 columns in 4 separate reference tables. (Or just the columns that have repeated data. It seems that business_key
will have duplicate data but not that many. So, it won't be much good to make a reference table for that column.)
Example: Put concept
values in a new table with something like:
CREATE TABLE Concept_Ref
( concept_id INT AUTO_INCREMENT
, concept VARCHAR(255)
, PRIMARY KEY concept_id
, UNIQUE INDEX concept_idx (concept)
) ;
INSERT INTO Concept_Ref
( concept )
SELECT DISTINCT
concept
FROM
Memory ;
and then change the Memory
table with:
ALTER TABLE Memory
ADD COLUMN concept_id INT ;
do this (once):
UPDATE
Memory m
JOIN
Concept_Ref c
ON c.concept = m.concept
SET m.concept_id = c.concept_id
and then drop the Memory.concept
column:
ALTER TABLE Memory
DROP COLUMN concept ;
You can also add FOREIGN KEY
references if you change your tables from MyISAM to InnoDB.
After doing the same for all 4 columns, not only the length of new compound index in the Memory
table will be much smaller but your table size will be much smaller, too. Additionally, any other index that uses any of those columns will have smaller length.
Off course the query would need 4 JOINs to be written. And any INSERT
, UPDATE
or DELETE
statement to this table will have to be changed and carefully designed.
But overall, I think you will have better performance. With the design you have now, it seems that values like 'case'
, 'status'
and 'closed'
are repeated many times.