I am using RLIKE to find some email domains with mysql.
Here is the Query:
SELECT something
FROM table1
WHERE SUBSTRING_INDEX(table1.email, "@", -1) RLIKE "test1.com"|"test2.com"
This matched all the email domains with numbers in, example:
aaa@domain0.com
Any idea why?
EDIT: I also noticed that it finds email domains that has at least two successive digits.
Really strange.
The string supplied to RLIKE
or REGEXP
needs to be a quoted string, wherein the entire regular expression is single-quoted. What you have are two double-quoted strings separated by |
, which is the bitwise OR operator.
That is causing the whole expression to be evaluated as 0
, and that's why the domain aaa@domain0.com
is matched:
# The unquoted | evaluates this to zero:
mysql> SELECT "string" | "string";
+---------------------+
| "string" | "string" |
+---------------------+
| 0 |
+---------------------+
# And zero matches domain0.com
mysql> SELECT 'domain0.com' RLIKE '0';
+-------------------------+
| 'domain0.com' RLIKE '0' |
+-------------------------+
| 1 |
+-------------------------+
Instead, you would need to use RLIKE
with a single-quoted string, and backslash-escape the .
. I'm also adding ^$
anchors so substrings are not matched.
WHERE SUBSTRING_INDEX(table1.email, "@", -1) RLIKE '^test1\.com$|^test2\.com$'
It could also be expressed as '^(test1\.com|test2\.com)$'
. The trick is that |
has very low precedence so you need to ensure both ends are anchored for every possible string you want to match.
However, if you are just trying to match a list of domains, it is far easier to do it with IN ()
so you may merely list them:
WHERE SUBSTRING_INDEX(table1.email, "@", -1) IN ('test1.com', 'test2.com', 'test4.org')