Search code examples
sqldatabasecomplexity-theorysql-like

SQL `LIKE` complexity


Does anyone know what the complexity is for the SQL LIKE operator for the most popular databases?


Solution

  • Let's consider the three core cases separately. This discussion is MySQL-specific, but might also apply to other DBMS due to the fact that indexes are typically implemented in a similar manner.

    LIKE 'foo%' is quick if run on an indexed column. MySQL indexes are a variation of B-trees, so when performing this query it can simply descend the tree to the node corresponding to foo, or the first node with that prefix, and traverse the tree forward. All of this is very efficient.

    LIKE '%foo' can't be accelerated by indexes and will result in a full table scan. If you have other criterias that can by executed using indices, it will only scan the the rows that remain after the initial filtering.

    There's a trick though: If you need to do suffix matching - searching for file names with extension .foo, for instance - you can achieve the same performance by adding a column with the same contents as the original one but with the characters in reverse order.

    ALTER TABLE my_table ADD COLUMN col_reverse VARCHAR (256) NOT NULL;
    ALTER TABLE my_table ADD INDEX idx_col_reverse (col_reverse);
    UPDATE my_table SET col_reverse = REVERSE(col);
    

    Searching for rows with col ending in .foo then becomes:

    SELECT * FROM my_table WHERE col_reverse LIKE 'oof.%'
    

    Finally, there's LIKE '%foo%', for which there are no shortcuts. If there are no other limiting criterias which reduces the amount of rows to a feasible number, it'll cause a hard performance hit. You might want to consider a full text search solution instead, or some other specialized solution.