mysql sql query-optimization sql-like calculated-columns

MySQL virtual column and wildcard

I was trying MySQL secondary indexing referring to MySQL Documentation, and weird thing happened.

Firstly, I created a table with small modification per the example in the document

create table jemp(
c JSON,
g VARCHAR(20) GENERATED ALWAYS AS (c->"$.name"),
INDEX i (g)
)

Secondly, I inserted values per the example in the document

INSERT INTO jemp (c) VALUES
('{"id": "1", "name": "Fred"}'), ('{"id": "2", "name": "Wilma"}'),
('{"id": "3", "name": "Barney"}'), ('{"id": "4", "name": "Betty"}');

And then, I tried to perform a fuzzy search with "like" and "wildcard". This doesn't work because index doesn't support prefix %, but it can get result.

select c->"$.name" as name from jemp where g like "%F%"

Here is the weird thing, I removed the prefix %, and index did work. However, I didn't get any results. Per my poor understanding of MySQL, this should work.

select c->"$.name" as name from jemp where g like "F%"

I would be so much appreciate if anyone could help me with it.

Solution

For your query to work, you want a generated column that extracts the name as text rather than JSON. That is, use ->> instead of ->:

g VARCHAR(20) GENERATED ALWAYS AS (c ->> '$.name')

Then: the index may help for both following conditions:

where g like 'F%'
where g = 'F'

Whether MySQL decides to use it or not is another story; basically the databases assesses whether using the index will be faster than a full scan. If it believes that the condition will match on a large number of rows, it will probably choose to full scan.

Note that I consistently use single quotes for string literals; although MySQL tolerates otherwise, this is what the SQL standard specifies. In some other databases, double quotes stand for identifiers (this also is compliant with the standard).