Search code examples
mysqlquery-optimization

How Can I Optimize My MySQL Select Query?


I have a database table that contains over 90,000 rows.

Each row contains a word that is either an adjective, adverb, noun, or verb (as indicated by a Type column).

I need to run a MySQL query that returns a small number (e.g. 3 - 10) of randomly chosen nouns where the length of the word is between two supplied numbers (e.g. 4 - 8).

The performance of the queries I've tried is not that great.

Here's the table structure:

CREATE TABLE `words` (
  `id` int(11) NOT NULL,
  `type` char(1) COLLATE utf8_unicode_ci NOT NULL,
  `word` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `variations` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `free_writing_prompt` smallint(1) NOT NULL DEFAULT '0',
  `word_length` smallint(5) NOT NULL DEFAULT '0'
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

ALTER TABLE `words`
  ADD PRIMARY KEY (`id`),
  ADD UNIQUE KEY `word_type` (`type`,`word`),
  ADD KEY `type` (`type`),
  ADD KEY `Word Length` (`word_length`);

ALTER TABLE `words`
  MODIFY `id` int(11) NOT NULL AUTO_INCREMENT;

The free_writing_prompt column is set to 0 or 1, 1 meaning it's a valid option for the query, and 0 meaning ignore it. At the moment, all of the nouns have this column set to 1, but the plan is to change many words to 0 because they are not nouns that should be returned.

Here's the row count, by Type:

Adjectives: 21,499
Adverbs: 4,475
Nouns: 58,670
Verbs: 8,978

This is the first query I tried:

SELECT  word
    FROM  words
    WHERE  type='n'
      AND  free_writing_prompt=1
      AND  CHAR_LENGTH(word)>=4
      AND  CHAR_LENGTH(word)<=8
    ORDER BY  RAND()
    LIMIT  3;

After this, I decided to add the word_length column, which was not originally part of the table definition, because using the CHAR_LENGTH function on all those rows seemed inefficient when I could pre-calculate and store that value instead.

I also thought it would be worthwhile building a Key index on the word_length column, but I'm not convinced it's really helped.

I then tried the following three queries, but they all still take about 16 seconds to run:

SELECT  word
    FROM  words
    WHERE  type='n'
      AND  free_writing_prompt=1
      AND  word_length>=4
      AND  word_length<=8
    ORDER BY  RAND()
    LIMIT  3;
SELECT  word
    FROM  words
    WHERE  type='n'
      AND  free_writing_prompt=1
      AND  word_length IN (
        SELECT  word_length
            FROM  words
            WHERE  word_length>=4
              AND  word_length<=8
                          )
    ORDER BY  RAND()
    LIMIT  3;
SELECT  word
    FROM  words
    WHERE  word_length IN (
        SELECT  word_length
            FROM  words
            WHERE  type='n'
              AND  free_writing_prompt=1
              AND  word_length>=4
              AND  word_length<=8
                          )
    ORDER BY  RAND()
    LIMIT  3;

I then tried this Union query, which was fractionally faster (but maybe not enough to be statistically significant):

SELECT * FROM (
    SELECT word FROM words WHERE type='n' AND free_writing_prompt=1 AND word_length=4
    UNION
    SELECT word FROM words WHERE type='n' AND free_writing_prompt=1 AND word_length=5
    UNION
    SELECT word FROM words WHERE type='n' AND free_writing_prompt=1 AND word_length=6
    UNION
    SELECT word FROM words WHERE type='n' AND free_writing_prompt=1 AND word_length=7
    UNION
    SELECT word FROM words WHERE type='n' AND free_writing_prompt=1 AND word_length=8
) a
ORDER BY RAND() LIMIT 3;

Does anybody have any ideas on how to optimize this query to improve the run time?


Solution

  • Composite index:

        WHERE  type='n'
          AND  free_writing_prompt=1
          AND  word_length>=4
          AND  word_length<=8
    

    begs for

    INDEX(type, free_writing_prompt,   -- in either order
          word_length)                 -- last
    

    Even better, is to stretch it into a covering index:

    INDEX(type, free_writing_prompt,   -- in either order
          word_length,                 -- last (for indexing purposes)
          word)                        -- to make "covering"
    

    Your first SELECT will work well with this covering index. Alas it won't be perfect since it will build a temp table of several thousand, sort it, then peel off 3 rows.