Search code examples
c#.netregexstreet-address

RegEx do not match group with following \.\d


I try to find addresses in different texts. It works quite well except that it also matches a word followed by a date (foobar 22.01.2012 => address: foobar 22) So I would like to improve the regex in a way that a streetnumber MUST NOT be followed by "(.|:)\d"

This is what I have:

(?<str>\b([a-zA-Z]+-*[a-zA-Z]+(-|\s)*([a-zA-Z]|-)+)\b\.?\s{1})(?<no>\d+(\s?[a-zA-Z])?\b)

A representative text:

Consultation hours
Monday, the 06.02. until Friday, the 10.02.2012 and
Monday, the 13.02. until Tuesday, the 14.02.2012,
each 14.00-15.30 o'clock, second floor,
Am Fasanengarten 12 foobar
Schlossstr. 34

What should be found?
Am Fasanengarten 12
Schlossstr. 34

What is found?
the 06
the 10
the 13
the 14
each 14
Am Fasanengarten 12
foobar // why is this a match? Without number?
Schlossstr. 34

I tried different positive/negative lookbehinds/-aheads but with no luck.


Solution

  • Try this here

    (?<str>\b(?:[a-zA-Z]+-*[a-zA-Z]+(?:[ \t-])*(?:[a-zA-Z]|-)+)\b\.?\s)(?<no>\d+(?:\s?[a-zA-Z])?\b)(?![.:]\d)
    

    See it here on Regexr

    The negative lookahead (?![.:]\d) at the end assures, that there is no "." and no ":" followed by \d ahead.

    foobar // why is this a match? Without number?
    Schlossstr. 34

    This is a match because you allow \s between the words of the streetname

    (?<str>\b([a-zA-Z]+-*[a-zA-Z]+(-|\s)*([a-zA-Z]|-)+)\b\.?\s{1})(?<no>\d+(\s?[a-zA-Z])?\b)
                                     ^^ here
    

    I replaced this in my solution with [ \t-], this allows only space, Tab and hyphen.

    \s is "Whitespace" and this contains also the line brake characters, because of this it matches the foobar, if you would have looked at the group, you would have seen, that it matches the address "foobar Schlossstr. 34"