Search code examples
pythonregexpython-3.xregex-group

Why doesn't python regex search method consistently return the matched object correctly?


I am doing a practice question on a Regex course:

How would you write a regex that matches a sentence where the first word is either Alice, Bob, or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs; and the sentence ends with a period? This regex should be case-insensitive. It must match the following:

  • Alice eats apples.
  • Bob pets cats.
  • Carol throws baseballs.
  • Alice throws Apples.
  • BOB EATS CATS.

My code is as follows:

regex=re.compile(r'Alice|Bob|Carol\seats|pets|throws\sapples\.|cats\.|baseballs\.',re.IGNORECASE)
mo=regex.search(str)
ma=mo.group()

When I pass str ='BOB EATS CATS.' or 'Alice throws Apples.', mo.group() only returns 'Bob' or 'Alice' respectively, but I was expecting it to return the whole sentence.

When I pass str='Carol throws baseballs.', mo.group() returns 'baseballs.', which is the last match.

I am confused as to why:

  • For the first two str examples I passed, it returned the first match('Bob' or 'Alice'), whilst the 3rd str example I passed returned the last match ('baseball')?

  • In all 3 str examples, I'm not sure why mo.group() is not returning the entire sentence as the match. i.e. i was expecting 'Carol throws baseballs.' as output from mo.group()


Solution

  • You need to tell your regex to group the lists of options somehow, or it will naturally think it's one giant list, with some elements containing spaces. The easiest way is to use capture groups for each word:

    regex=re.compile(r'(Alice|Bob|Carol)\s+(eats|pets|throws)\s+(apples|cats|baseballs)\.', re.IGNORECASE)
    

    The trailing period shouldn't be part of an option. If you don't want to use capturing groups for some reason (it won't really affect how the match is made), you can use non-capturing groups instead. Replace (...) with (?:...).

    Your original regex was interpreted as the following set of options:

    • Alice
    • Bob
    • Carol\seats
    • pets
    • throws\sapples.
    • cats.
    • baseballs.

    Spaces don't magically separate options. Hopefully you can see why none of the elements of Carol throws baseballs. besides baseballs. is present in that list. Something like Carol eats baseballs. would match Carol eats though.