I am trying to do the following: from a list of strings extract anything before the first occurrence (there may be more than one) of a whitespace followed by a round bracket "(".
I have tried the following:
re.findall("(.*)\s\(", line))
but it gives the wring results for e.g. the following strings:
Carrollton (University of West Georgia)[2]*Dahlonega (North Georgia College & State University)[2]
Thanks in advance
To extract anything before the first occurrence of a whitespace char followed by a round bracket (
you may use re.search
(this method is meant to extract the first match only):
re.search(r'^(.*?)\s\(', text, re.S).group(1)
re.search(r'^\S*(?:\s(?!\()\S*)*', text).group()
See regex #1 demo and regex #2 demos. Note the second one - though longer - is much more efficient since it follows the unroll-the-loop principle.
- start of string(.*?)
- Group 1: any 0+ chars as few as possible, \s\(
- a whitespace and (
char.Or, better:
- start of string and then 0+ non-whitespace chars(?:\s(?!\()\S*)*
- 0 or more occurrences of
- a whitespace char not followed with (
- 0+ non-whitespace charsSee Python demo:
import re
strs = ['Isla Vista (University of California, Santa Barbara)[2]','Carrollton (University of West Georgia)[2]','Dahlonega (North Georgia College & State University)[2]']
rx = re.compile(r'^\S*(?:\s(?!\()\S*)*', re.S)
for s in strs:
m = rx.search(s)
if m:
print('{} => {}'.format(s, m.group()))
print("{}: No match!".format(s))