Search code examples
pythonregexpython-re

Using Python re and findall to match complex combination of digits in string


Im trying to use python re library in order to analyze a string containing a street name and multiple (or just a single) numbers separated by a forward slash.

example = 'Examplestreet 1/2.1/3a/10/10.1/11b/12a-12c/13a-c'

I want to match all digits, including positions after the dot and adjacent alpha characters. If a hyphen connects two numbers with an alpha character, they should also be considered as one match.


Expected output:

['1', '2.1', '3a', '10', '10.1', '11b', '12a-12c', '13a-c']

I'm trying the following

numbers = re.findall(r'\d+\.*\d*\w[-\w]*', example)

Which is able to find all except single non-float digits (i.e. '1'):

print(numbers)

['2.1', '3a', '10', '10.1', '11b', '12a-12c', '13a-c'] 

How do I need to tweak my regex in order to achieve the desired output?


Solution

  • The pattern does not match the single 1 as \d+\.*\d*\w[-\w]* expects at least 2 characters being at least 1 digit for \d+ and 1 word character for \w

    If the address should not end on - and can only match characters a-z after the digits, and using a case insensitive match:

    \b\d+(?:\.\d+)?[a-z]*(?:-\w+)*
    
    • \b A word boundary
    • \d+(?:\.\d+)? Match digits with an optional decimal part
    • [a-z]* Match optional chars a-z
    • (?:-\w+)* optionally repeat matching - and 1 or more word characters

    Regex demo

    Note that matching an address can be hard as there can be many different notations, this pattern matches the given format in the example string.

    import re
    
    example = 'Examplestreet 1/2.1/3a/10/10.1/11b/12a-12c/13a-c'
    pattern = r"\b\d+(?:\.\d+)?[a-z]*(?:-\w+)*"
    print(re.findall(pattern, example))
    

    Output

    ['1', '2.1', '3a', '10', '10.1', '11b', '12a-12c', '13a-c']