Search code examples
pythonregexfindall

Python regex.findall not finding all matches of shorter length


How can I find all matches that don't necessarily consume all characters with * and + modifiers?

import regex as re
matches = re.findall("^\d+", "123")
print(matches)
# actual output: ['123']
# desired output: ['1', '12', '123']

I need the matches to be anchored to the start of the string (hence the ^), but the + doesn't even seem to be considering shorter-length matches. I tried adding overlapped=True to the findall call, but that does not change the output.

Making the regex non-greedy (^\d+?) makes the output ['1'], overlapped=True or not. Why does it not want to keep searching further?

I could always make shorter substrings myself and check those with the regex, but that seems rather inefficient, and surely there must be a way for the regex to do this by itself.

s = "123"
matches = []
for length in range(len(s)+1):
    matches.extend(re.findall("^\d+", s[:length]))
print(matches)
# output: ['1', '12', '123']
# but clunky :(

Edit: the ^\d+ regex is just an example, but I need it to work for any possible regex. I should have stated this up front, my apologies.


Solution

  • You could use overlapped=True with the PyPi regex module and reverse searching (?r)

    Then reverse the resulting list from re.findall

    import regex as re
    
    res = re.findall(r"(?r)^\d+", "123", overlapped=True)
    res.reverse()
    print(res)
    

    Output

    ['1', '12', '123']
    

    See a Python demo.