Search code examples
pythonregexrubular

Difference in regex between Python and Rubular?


In Rubular, I have created a regular expression:

(Prerequisite|Recommended): (\w|-| )*

It matches the bolded:

Recommended: good comfort level with computers and some of the arts.

Summer. 2 credits. Prerequisite: pre-freshman standing or permission of instructor. Credit may not be applied toward engineering degree. S-U grades only.

Here is a use of the regex in Python:

note_re = re.compile(r'(Prerequisite|Recommended): (\w|-| )*', re.IGNORECASE)

def prereqs_of_note(note):
    match = note_re.match(note)
    if not match:
        return None
    return match.group(0) 

Unfortunately, the code returns None instead of a match:

>>> import prereqs

>>> result  = prereqs.prereqs_of_note("Summer. 2 credits. Prerequisite: pre-fres
hman standing or permission of instructor. Credit may not be applied toward engi
neering degree. S-U grades only.")

>>> print result
None

What am I doing wrong here?

UPDATE: Do I need re.search() instead of re.match()?


Solution

  • You want to use re.search() because it scans the string. You don't want re.match() because it tries to apply the pattern at the start of the string.

    >>> import re
    >>> s = """Summer. 2 credits. Prerequisite: pre-freshman standing or permission of instructor. Credit may not be applied toward engineering degree. S-U grades only."""
    >>> note_re = re.compile(r'(Prerequisite|Recommended): ([\w -]*)', re.IGNORECASE)
    >>> note_re.search(s).groups()
    ('Prerequisite', 'pre-freshman standing or permission of instructor')
    

    Also, if you want to match past the first period following the word "instructor" you're going to have to add a literal '.' into your pattern:

    >>> re.search(r'(Prerequisite|Recommended): ([\w -\.]*)', s, re.IGNORECASE).groups()
    ('Prerequisite', 'pre-freshman standing or permission of instructor. Credit may not be applied toward engineering degree. S-U grades only.')
    

    I would suggest you make your pattern greedier and match on the rest of the line, unless that's not really what you want, although it seems like you do.

    >>> re.search(r'(Prerequisite|Recommended): (.*)', s, re.IGNORECASE).groups()
    ('Prerequisite', 'pre-freshman standing or permission of instructor. Credit may not be applied toward engineering degree. S-U grades only.')
    

    The previous pattern with the addition of literal '.', returns the same as .* for this example.