Search code examples
mysqlregexdjangodjango-modelsdjango-mysql

djagno filter regex not working while python re works


I am trying to do iregex match in django for the regular expression

reg_string = (\w|\d|\b|\s)+h(\w|\d|\b|\s)+(\w|\d|\b|\s)+anto(\w|\d|\b|\s)+
self.queryset.filter(name__iregex=r"%s"%(reg_string,))

by using the word "The Canton" for name but its not returning any value but while using it in python re.search its working

print (re.search(r'(\w|\d|\b|\s)+h(\w|\d|\b|\s)+(\w|\d|\b|\s)+anto(\w|\d|\b|\s)+', 'The Canton', re.I).group()

I am using Mysql 5.7, any one know how to fix this


Solution

  • Note that MySQL REGEXP does not support shorthand character classes like \s, \d, \w, etc. It supports some basic POSIX character classes like [:digit:], [:alpha:], [:alnum:], etc.

    Even if you keep on using the pattern in Python, you should not write (\w|\d|\b|\s)+ as it matches and captures a single char that is a word char or digit, word boundary, or whitespace, 1 or more times (and rewriting buffer of Group N with the latest char the engine matched). You could rewrite that with a single character class - [\w\s]+.

    Now, your pattern in MySQL will look like

    [_[:alnum:][:space:]]+h[_[:alnum:][:space:]]+anto[_[:alnum:]‌​[:space:]]+
    

    where [\w\s]+ is turned into [_[:alnum:][:space:]]+:

    • [ - start of a bracket expression
    • _ - an underscore (as \w matches _ and [:alnum:] does not)
    • [:alnum:] - an alphanuemric char
    • [:space:] - any whitespace char
    • ] - end of the bracket expression
    • + - quantifier, 1 or more times.