Python re expression returning whole string, but groups not providing whole string

I'm trying to create a regular expression in python that will match certain elements of a user-inputted string. So far, that is re.match("( 0b[10]+| [0-9]+| '.+?'| \".+?\")+", user_cmd).

When user_cmd = ' 12 0b110110 \' \' " " "str" \'str\'', re.match("( 0b[10]+| [0-9]+| '.+?'| \".+?\")+", user_cmd) returns <re.Match object; span=(0, 32), match=' 12 0b110110 \' \' " " "str" \'str\''> which is the whole string so, because everything is matched, and everything in the regex is in parenthesis, everything should be in a group, right? It turns out not because re.match("( 0b[10]+| [0-9]+| '.+?'| \".+?\")+", user_cmd).groups() returns (" 'str'",) (only one item). Why is this? How do I make the regular expression return each and every item it should return in the groups command?

Solution

Your pattern is repeating a captured group, which will capture the value of the last iteration in group 1 which is 'str'

For your matches, you don't need to repeat a group if you want the separate matches, and you don't need a capture group if you want the matches only.

What you might do as all the parts start with a space is match a space and use a non capture group with the alternation |.

Instead of a non greedy quantifier .+? you can use a negated character class to have less backtracking.

 (?:0b[10]+|[0-9]+|'[^']+'|"[^"]+")

(?: Match a space and start a non capture group for the alternation |
- 0b[10]+ Match 0b and 1+ occurrences of 1 or 0
- | or
- [0-9]+ Match 1+ digits 0-9
- | Or
- '[^']+' Match from ' till ' using a negated character class which will match 1+ times any char except '
- | Or
- "[^"]+" Match from " till " using another negated character class
) Close non capture group

Regex demo | Python demo

For example getting all the matches with re.findall to get all the matches:

import re
 
user_cmd = ' 12 0b110110 \' \' " " "str" \'str\''
pattern = r" (?:0b[10]+|[0-9]+|'[^']+'|\"[^\"]+\")"
 
print(re.findall(pattern, user_cmd))

Output

[' 12', ' 0b110110', " ' '", ' " "', ' "str"', " 'str'"]

If you want the full match, you can make use of the captures() using the PyPi regex module

import regex

pattern = r"""( (?:0b[10]+|[0-9]+|'[^']+'|\"[^\"]+\"))+"""
user_cmd = ' 12 0b110110 \' \' " " "str" \'str\''
m = regex.match(pattern, user_cmd)
print(m.captures(1))

Output

[' 12', ' 0b110110', " ' '", ' " "', ' "str"', " 'str'"]

Python demo