Im trying to use python re library in order to analyze a string containing a street name and multiple (or just a single) numbers separated by a forward slash.
example = 'Examplestreet 1/2.1/3a/10/10.1/11b/12a-12c/13a-c'
I want to match all digits, including positions after the dot and adjacent alpha characters. If a hyphen connects two numbers with an alpha character, they should also be considered as one match.
Expected output:
['1', '2.1', '3a', '10', '10.1', '11b', '12a-12c', '13a-c']
I'm trying the following
numbers = re.findall(r'\d+\.*\d*\w[-\w]*', example)
Which is able to find all except single non-float digits (i.e. '1'
):
print(numbers)
['2.1', '3a', '10', '10.1', '11b', '12a-12c', '13a-c']
How do I need to tweak my regex in order to achieve the desired output?
The pattern does not match the single 1 as \d+\.*\d*\w[-\w]*
expects at least 2 characters being at least 1 digit for \d+
and 1 word character for \w
If the address should not end on -
and can only match characters a-z after the digits, and using a case insensitive match:
\b\d+(?:\.\d+)?[a-z]*(?:-\w+)*
\b
A word boundary\d+(?:\.\d+)?
Match digits with an optional decimal part[a-z]*
Match optional chars a-z(?:-\w+)*
optionally repeat matching -
and 1 or more word charactersNote that matching an address can be hard as there can be many different notations, this pattern matches the given format in the example string.
import re
example = 'Examplestreet 1/2.1/3a/10/10.1/11b/12a-12c/13a-c'
pattern = r"\b\d+(?:\.\d+)?[a-z]*(?:-\w+)*"
print(re.findall(pattern, example))
Output
['1', '2.1', '3a', '10', '10.1', '11b', '12a-12c', '13a-c']