For data in the following structure I want to obtain the parsed street name details:
# streetname 1() refers to house number 1 with an empty () additional qualifier
keyword_token: street name 4()
keyword_token: street-name 14()
keyword_token: streetname 123()keyword_token: streetname 123()
# why is it logged one message per line, but we get the address logged twice - sometimes??
keyword_token: streetname 9(7)keyword_token: streetname 9(7)
keyword_token: streetname 27()\r\n a lot more text and log messages in the free form text log - one messageper line \n
keyword_token: street-name 1-23(BLOCK D HAUS 6)keyword_token: street-name 1-23(BLOCK H HAUS 2)keyword_token: street-name 1-23(BLOCK G HAUS 3)',
The ideall expected result is: 3 fields for each record:
So far I experimented with the regex of: keyword_token(.*)
, but this is giving the whole line after the keyword token.
Complications:
keyword_token:
keyword_token:
and go until the (
edit: an example regex101 is found here https://regex101.com/r/ueEfNU/1
edit 2: also not numeric house numbers need to be supported.
keyword_token: street_name 32a()
You can use
keyword_token:\s*(.*?)\s*(\d[a-zA-Z\d-]*)\(([^()]*)\)
See the regex demo. Details:
keyword_token:
- a fixed string\s*
- zero or more whitespaces(.*?)
- Group 1: any zero or more chars other than line break chars, as few as possible (due to *?
lazy quantifier)\s*
- zero or more whitespaces(\d[a-zA-Z\d-]*)
- Group 2: a digit and then zero or more letters, digits or -
char\(
- a (
char([^()]*)
- Group 3: one or more chars other than (
and )
\)
- a )
char.