Difference in regex between Python and Rubular?

In Rubular, I have created a regular expression:

(Prerequisite|Recommended): (\w|-| )*

It matches the bolded:

Recommended: good comfort level with computers and some of the arts.

Summer. 2 credits. Prerequisite: pre-freshman standing or permission of instructor. Credit may not be applied toward engineering degree. S-U grades only.

Here is a use of the regex in Python:

note_re = re.compile(r'(Prerequisite|Recommended): (\w|-| )*', re.IGNORECASE)

def prereqs_of_note(note):
    match = note_re.match(note)
    if not match:
        return None
    return match.group(0)

Unfortunately, the code returns None instead of a match:

>>> import prereqs

>>> result  = prereqs.prereqs_of_note("Summer. 2 credits. Prerequisite: pre-fres
hman standing or permission of instructor. Credit may not be applied toward engi
neering degree. S-U grades only.")

>>> print result
None

What am I doing wrong here?

UPDATE: Do I need re.search() instead of re.match()?

Solution

You want to use re.search() because it scans the string. You don't want re.match() because it tries to apply the pattern at the start of the string.

>>> import re
>>> s = """Summer. 2 credits. Prerequisite: pre-freshman standing or permission of instructor. Credit may not be applied toward engineering degree. S-U grades only."""
>>> note_re = re.compile(r'(Prerequisite|Recommended): ([\w -]*)', re.IGNORECASE)
>>> note_re.search(s).groups()
('Prerequisite', 'pre-freshman standing or permission of instructor')

Also, if you want to match past the first period following the word "instructor" you're going to have to add a literal '.' into your pattern:

>>> re.search(r'(Prerequisite|Recommended): ([\w -\.]*)', s, re.IGNORECASE).groups()
('Prerequisite', 'pre-freshman standing or permission of instructor. Credit may not be applied toward engineering degree. S-U grades only.')

I would suggest you make your pattern greedier and match on the rest of the line, unless that's not really what you want, although it seems like you do.

>>> re.search(r'(Prerequisite|Recommended): (.*)', s, re.IGNORECASE).groups()
('Prerequisite', 'pre-freshman standing or permission of instructor. Credit may not be applied toward engineering degree. S-U grades only.')

The previous pattern with the addition of literal '.', returns the same as .* for this example.