Search code examples
mysqlindexingquery-optimization

Why doesn't MySql automatically optimises BETWEEN query?


I have two query for same output

Slow Query:

SELECT 
    *
FROM
    account_range
WHERE
    is_active = 1 AND '8033576667466317' BETWEEN range_start AND range_end;

Execution Time: ~800 ms.

Explain:

+----+-------------+---------------+------------+------+-------------------------------------------+------+---------+------+--------+----------+-------------+
| id | select_type | table         | partitions | type | possible_keys                             | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+---------------+------------+------+-------------------------------------------+------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | account_range | NULL       | ALL  | range_start,range_end,range_se_active_idx | NULL | NULL    | NULL | 940712 |     2.24 | Using where |
+----+-------------+---------------+------------+------+-------------------------------------------+------+---------+------+--------+----------+-------------+

Very Fast Query: learnt from here

SELECT 
    *
FROM
    account_range
WHERE
    is_active = 1 AND 
    range_start = (SELECT 
            MAX(range_start)
        FROM
            account_range
        WHERE
            range_start <= '8033576667466317') AND 
    range_end = (SELECT 
            MIN(range_end)
        FROM
            account_range
        WHERE
            range_end >= '8033576667466317')

Execution Time: ~1ms

Explain:

+----+-------------+---------------+------------+------+-------------------------------------------+---------------------+---------+-------------------+------+----------+------------------------------+
| id | select_type | table         | partitions | type | possible_keys                             | key                 | key_len | ref               | rows | filtered | Extra                        |
+----+-------------+---------------+------------+------+-------------------------------------------+---------------------+---------+-------------------+------+----------+------------------------------+
|  1 | PRIMARY     | account_range | NULL       | ref  | range_start,range_end,range_se_active_idx | range_se_active_idx | 125     | const,const,const |    1 |   100.00 | NULL                         |
|  3 | SUBQUERY    | NULL          | NULL       | NULL | NULL                                      | NULL                | NULL    | NULL              | NULL |     NULL | Select tables optimized away |
|  2 | SUBQUERY    | NULL          | NULL       | NULL | NULL                                      | NULL                | NULL    | NULL              | NULL |     NULL | Select tables optimized away |
+----+-------------+---------------+------------+------+-------------------------------------------+---------------------+---------+-------------------+------+----------+------------------------------+

Table Structure:

CREATE TABLE account_range (
    id int(11) unsigned NOT NULL AUTO_INCREMENT,
    range_start varchar(20) NOT NULL,
    range_end varchar(20) NOT NULL,
    is_active tinyint(1) NOT NULL,
    bank_name varchar(100) DEFAULT NULL,
    addedon timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updatedon timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    description text,
    PRIMARY KEY (id),
    KEY range_start (range_start),
    KEY range_end (range_end),
    KEY range_se_active_idx (range_start , range_end , is_active)
)  ENGINE=InnoDB AUTO_INCREMENT=946132 DEFAULT CHARSET=utf8;

Please do explain Why doesn't MySql automatically optimizes BETWEEN query?

Update:
Realised my mistake from @kordirko answer. My table contains only non-overlapping ranges, so both queries are returning same results.


Solution

  • Such a comparision doesn't make sense, since you are comparing apples to oranges.

    These two queries are not eqivalent, they give different resuts,
    thus MySql optimises them in a different way and their plans can differ.

    See this simple example: http://sqlfiddle.com/#!9/98678/2

    create table account_range(
      is_active int,
      range_start int,
      range_end int
     );
    
     insert into account_range values
     (1,-20,100), (1,10,30);
    

    First query gives 2 rows:

    select * from account_range
     where is_active = 1 and 25 between range_start AND range_end;
    
    | is_active | range_start | range_end |
    |-----------|-------------|-----------|
    |         1 |         -20 |       100 |
    |         1 |          10 |        30 |
    

    Second query gives only 1 row:

    SELECT * FROM account_range
    WHERE
        is_active = 1 AND 
        range_start = (SELECT MAX(range_start)
                       FROM account_range
                       WHERE range_start <= 25
        ) AND 
        range_end = (SELECT MIN(range_end)
                     FROM account_range
                     WHERE range_end >= 25
        )
    
    | is_active | range_start | range_end |
    |-----------|-------------|-----------|
    |         1 |          10 |        30 |
    

    To speed this query up (the first one), two bitmap indexes can be used together with "bitmap and" operation - but MySql doesn't have such a feature.

    Another option is a spatial index (for example GIN indexes in PostgreSql: http://www.postgresql.org/docs/current/static/textsearch-indexes.html).

    And another option is a star transformation (or a star schema) - you need to "divide" this table into two "dimension" or "measures" tables and one "fact" table .... but this is too broad topic, if you want know more you can start from here: https://en.wikipedia.org/wiki/Star_schema