So this question was inspired by the following question on codereview: Converting a string to an array of integers. Which opens as follows:
I am dealing with a string draw_result that can be in one of the following formats:
"03-23-27-34-37, Mega Ball: 13" "01-12 + 08-20" "04-15-17-25-41"
I always start with draw_result where the value is one from the above values. I want to get to:
[3, 23, 27, 34, 37] [1, 12, 8, 20] [4, 15, 17, 25, 41]
This question can be solved with multiple regex expressions as follows
import re
from typing import Iterable
lottery_searches = [
re.compile(pat).match
for pat in (
r'^(\d\d)-(\d\d)-(\d\d)-(\d\d)-(\d\d), Mega Ball.*$',
r'^(\d\d)-(\d\d) \+ (\d\d)-(\d\d)$',
r'^(\d\d)-(\d\d)-(\d\d)-(\d\d)-(\d+)$',
)
]
def lottery_string_to_ints(lottery: str) -> Iterable[int]:
for search in lottery_searches:
if match := search(lottery):
return (int(g) for g in match.groups())
raise ValueError(f'"{lottery}" is not a valid lottery string')
Assume that we allow a different seperator than -
but they must all be equal. For instance
"04/15/17/25/41"
"04,15,17,25,41"
"01,12 + 08,20"
"01?12 + 08?20"
are all valid formats.
Is it now possible to only have the digits in capture groups? Is it possible to mark all digit capture groups in some way for easy retrieval?
Regex
PATTERN = re.compile(
r"""
(?P<digit0>\d\d) # Matches a double digit [00..99] and names it digit0
(?P<sep>-) # Matches any one digit character - saves it as sep
(?P<digit1>\d\d) # Matches a double digit [00..99] and names it digit1
(\s+\+\s+|(?P=sep)) # Matches SPACE + SPACE OR the seperator saved in sep (-)
(?P<digit2>\d\d) # Matches a double digit [00..99] and names it digit2
(?P=sep) # Matches any one digit character - saves it as sep
(?P<digit3>\d\d) # Matches a double digit [00..99] and names it digit3
((?P=sep)(?P<digit4>\d\d))? # Checks if there is a final fifth digit (-01), saves to digit5
""",
re.VERBOSE,
)
Retrieval
def extract_numbers_narrow(draw_result, digits=5):
numbers = []
if match := re.match(PATTERN2, draw_result):
for i in range(digits):
ith_digit = f"digit{i}"
try:
number = int(match.group(ith_digit))
except IndexError: # Catches if the group does not exists
continue
except TypeError: # Catches if the group is None
continue
numbers.append(number)
return numbers
try
statement as the fifth digit might or might not appear in the result.It seems you want to get all numbers before a comma. You can use this PyPi regex
based solution
import regex
texts = ['03-23-27-34-37, Mega Ball: 13', '01-12 + 08-20', '04-15-17-25-41']
reg = regex.compile(r'^(?:[^\w,]*(\d+))+')
for text in texts:
match = reg.search(text)
if match:
print( text, '=>', list(map(int,match.captures(1))) )
See the online Python demo.
The ^(?:[^\w,]*(\d+))+
regex matches one or more sequences of any zero or more chars other than word and comma chars followed with one or more digits (captured into Group 1) at the start of string. Since regex
keeps a stack for each capturing group, you can access all captured numbers with .captures()
.
If you need to do it with built-in re
, you can use
import re
texts = ['03-23-27-34-37, Mega Ball: 13', '01-12 + 08-20', '04-15-17-25-41']
reg = re.compile(r'^(?:[^\w,]*\d+)+')
for text in texts:
match = reg.search(text)
if match:
print( text, '=>', list(map(int,re.findall(r'\d+', match.group()))) )
See this Python demo where re.findall(r'\d+'...)
extracts the numbers from the match value.
Both output:
03-23-27-34-37, Mega Ball: 13 => [3, 23, 27, 34, 37]
01-12 + 08-20 => [1, 12, 8, 20]
04-15-17-25-41 => [4, 15, 17, 25, 41]