I'm trying to split any number string such as 3.1415926535897932384626433832795028841971
right after each 0
or group of 0
. However, I would like to keep the 0 after each group.
For example, the string 10203040506070809011
should be split into
['10', '20', '30', '40', '50', '60', '70', '80', '90', '11']
and the string 3.1415926535897932384626433832795028841971
should be split into
['3.14159265358979323846264338327950', '28841971']
I tried to split apart the string with a positive lookbehind and an empty string:
import re
p = '(?<=0+)'
re.search(p, '102030405')
><_sre.SRE_Match object; span=(2, 2), match=''>
'102030405'.split(p)
>['102030405']
but this does not split apart the string at all, even though the pattern is matched.
I also tried just splitting apart the string based on the 0
and adding a 0
after the first couple strings, but it seems convoluted and inefficient.
l = '102030405'.split('0')
[e+'0' for e in l[:-1]] + [l[-1]]
>['10', '20', '30', '40', '5']
Is there any way to split a string based on a lookahead or lookbehind on an empty string? I'm asking about the general case, not just with numbers. For example, if I wanted to split apart 3:18am5:19pm10:28am
into the separate times without losing the am
or pm
, and get an array ['3:18am', '5:19pm', '10:28am']
, how would I go about doing this?
This simple regex in re.findall
should suffice:
l = re.findall(r'[.1-9]+(?:0+|$)', s)
Note:
findall
returns all non-overlapping matches of pattern in string, as a list of strings.
for each match we want the longest string of digits (or a dot) ending with at least one zero, or the end of the string
the zeros in the end should not be captured as another match (hence the (?:...
)
Similarly for you second example:
>>> re.findall(r'[\d:]+(?:am|pm|$)', '3:18am5:19pm10:28am')
['3:18am', '5:19pm', '10:28am']
No need for lookahead/lookbehind magic, or non-greedy matching.