I'm trying to create a regular expression in python that will match certain elements of a user-inputted string. So far, that is re.match("( 0b[10]+| [0-9]+| '.+?'| \".+?\")+", user_cmd)
.
When user_cmd = ' 12 0b110110 \' \' " " "str" \'str\''
, re.match("( 0b[10]+| [0-9]+| '.+?'| \".+?\")+", user_cmd)
returns <re.Match object; span=(0, 32), match=' 12 0b110110 \' \' " " "str" \'str\''>
which is the whole string so, because everything is matched, and everything in the regex is in parenthesis, everything should be in a group, right? It turns out not because re.match("( 0b[10]+| [0-9]+| '.+?'| \".+?\")+", user_cmd).groups()
returns (" 'str'",)
(only one item). Why is this? How do I make the regular expression return each and every item it should return in the groups command?
Your pattern is repeating a captured group, which will capture the value of the last iteration in group 1 which is 'str'
For your matches, you don't need to repeat a group if you want the separate matches, and you don't need a capture group if you want the matches only.
What you might do as all the parts start with a space is match a space and use a non capture group with the alternation |
.
Instead of a non greedy quantifier .+?
you can use a negated character class to have less backtracking.
(?:0b[10]+|[0-9]+|'[^']+'|"[^"]+")
(?:
Match a space and start a non capture group for the alternation |
0b[10]+
Match 0b and 1+ occurrences of 1
or 0
|
or[0-9]+
Match 1+ digits 0-9|
Or'[^']+'
Match from '
till '
using a negated character class which will match 1+ times any char except '
|
Or"[^"]+"
Match from "
till "
using another negated character class)
Close non capture groupFor example getting all the matches with re.findall to get all the matches:
import re
user_cmd = ' 12 0b110110 \' \' " " "str" \'str\''
pattern = r" (?:0b[10]+|[0-9]+|'[^']+'|\"[^\"]+\")"
print(re.findall(pattern, user_cmd))
Output
[' 12', ' 0b110110', " ' '", ' " "', ' "str"', " 'str'"]
If you want the full match, you can make use of the captures() using the PyPi regex module
import regex
pattern = r"""( (?:0b[10]+|[0-9]+|'[^']+'|\"[^\"]+\"))+"""
user_cmd = ' 12 0b110110 \' \' " " "str" \'str\''
m = regex.match(pattern, user_cmd)
print(m.captures(1))
Output
[' 12', ' 0b110110', " ' '", ' " "', ' "str"', " 'str'"]