I am trying to make more use of regEx in my search engine. Please take a look :
someStr = "Processor AMD Athlon II X4 651K BOX Black Edition, s. FM1, 3.0GHz, 4MB cache, Quad Core"
# THIS SHOULD MATCH / processors-plural with 0 to 1,
# mega or mb should be the same
# and quad with 0 to 2 of any characters except whitespace
queryListTrue = ["processors", "amd", "4mega", "quaddy"]
# THIS SHOULDN'T MATCH / bad last item length
queryListFalse = ["processors", "amd", "4mb", "quaddie"]
# TO DESCRIBE WHAT I NEED
rulesList = [ r'processor[i.e. 0-1 char]', r'amd',
r'4mega or 4mb', r'quad[from 0 to 2 any char]' ]
if ALL queryListTrue MATCHES someStr THRU rulesList :
print "What a wonderful world!"
Any help would be wonderful.
The regular expression for "[from 0 to 1 any char]" is simply
.?
i.e. dot .
matches any character (except newline, by default) and the ?
quantifier means the preceding expression is optional.
Note that processor.?
will also match a space after processor
or an arbitrary character such as processord
. You probably intend processors?
where the plural s
is optional, or perhaps processor[a-z]?
to constrain the optional last character to an alphabetic character.
Similarly, the generalized quantifier {m,n}
specifies "at least m repetitions and at most n repetitions", so your "[from 0 to 2 any char]" translated to regex is .{0,2}
.
Alternation in regular expressions is specified with |
so mega|mb
is the regex formulation for your "mega or mb". If you use the alternation in a longer context where some of the text is not subject to alternation, you need to add parentheses to scope the alternation, like m(ega|b)
.
In Python (like in most modern Perl-derived regex dialects), you can use (?:
instead of (
if the grouping behavior of regular parentheses is undesired.