I have a MySql table where I want to get the count of rows where a given VARCHAR column has a numeric value (convertible to number, you know). Right now, I'm doing a simple REGEXP check on this field. Since this table is very large, I'm using a series of indexes to REGEXP as few rows as possible.
But this VARCHAR column is also indexed. Is there a clever hack of the MySql indexing algorithm that I can exploit to scan even fewer rows? :-/ This is an InnoDB table.
You may not like this, as you are probably already trying to avoid it, but rather than trying to do some clever trick, when I have had situations like this, I add an additional column that stores the varchar in an numeric column (updated using a trigger), and query on that.
But, there is a way I can see to do it (though I have never had a reason to do this in production), which is to exploit the fact that indexing will put the values in order, such that all that begin with a number are sequenced together.
Assuming a table like this:
CREATE TABLE `test_1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`text_or_number` varchar(255),
PRIMARY KEY (`id`),
KEY `test_1_idx` (`text_or_number`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
You can select only values starting with numbers by exploiting the order of utf8 characters - http://en.wikipedia.org/wiki/UTF-8#Examples
The lowest value before 0 in utf-8 is "/", and the highest after it is ":", so this should extract only values that start with a number:
select cast(text_or_number as unsigned)
from test_1
where text_or_number < ':'
and text_or_number > '/'
and cast(text_or_number as unsigned) > 0;
That could still contain values that start with a number, but do not end with one, which is why I have added the cast(...) > 0 clause, but I think mysql will be smart enough to run the where clauses in order, so hopefully it will only run the cast on the subset of rows that start with a numeric char.