My regex is failing to capture all groups when I run in python, and I'm at a loss why...
I'm trying to pull out the sequences of numbers.
import re
my_string = '467..114..'
re.search(r'(?:(\d+)(?:\.*))*',my_string).groups()
# outputs ('114',)
I expect this to get the two groups, '467', and '114'.
To explain the expression (or what I'm thinking for it):
(\d+)
(?:\.*)
I've gotten things to pull out either the first or second set of digits, but struggling to get both at once...
I do know I could use re.findall()
to just get the numbers, but I want the span and I want to know why it isn't working...
Edit:
I've found that re.finditer()
does indeed return Match objects, not just the list of strings like re.findall()
. That seems like a reasonable way for me to actually go, but still curious about how to get it to work with search, which I feel like should be possible...
This won't work in search
because you have a repeated capture group, which will only return the last match (See for example this for an explanation) and it won't work if you remove the repeat, since search
only returns the first match (from the manual):
Scan through string looking for the first location where the regular expression pattern produces a match
For your use case, you can use finditer
, but you still need to remove the repeat on the outer group or you will match the entire string i.e. re.finditer(r'(?:(\d+)(?:\.*))',my_string)
Note that with the use of finditer
, the (?:\.*)
is redundant since it will match an empty string. If you want to enforce that the digits are followed by a .
, you need to change that to (?:\.+)
, otherwise, just remove it and you can simply use \d+
. Either way, you can also remove most of the groups.
So, to match the numbers with the dots, you can do:
[m for m in re.finditer(r'\d+\.+',my_string)]
And you will get
[
<re.Match object; span=(0, 5), match='467..'>,
<re.Match object; span=(5, 10), match='114..'>
]
Or to match only the numbers:
[m for m in re.finditer(r'\d+',my_string)]
Output:
[
<re.Match object; span=(0, 3), match='467'>,
<re.Match object; span=(5, 8), match='114'>
]
If you might have other numbers in the string and you only want to match the numbers followed by dots, either match them as above or use a lookahead to assert they are there:
[m for m in re.finditer(r'\d+(?=\.)',my_string)]
Output:
[
<re.Match object; span=(0, 3), match='467'>,
<re.Match object; span=(5, 8), match='114'>
]