Search code examples
pythonregexregex-lookaroundsregex-groupregex-greedy

RegEx for ignoring parentheses in a string


These is a string like this:

strs = "Tierd-Branden This is (L.A.) 105  / New (Even L.A.A)"

After trying the following code, I don't get my expected output.

and this is my code:

import re, itertools
strs = "Tierd-Branden This is (U.C.) 105  / New (Even L.A.A)"
print re.findall(r"[\w']+[\w\.]", strs)

I expect This:

['Tierd', 'Branden', 'This', 'is', 'L.A.', '105', 'New', 'Even', 'L.A.A']

But, I get this:

['Tierd', 'Branden', 'This', 'is', 'L.', 'A.', '105', 'New', 'Even', 'L.', 'A.']

My question is how to keep content of parenthesis with . linked as a list element?


Solution

  • The [\w']+[\w\.] pattern matches 1 or more word or ' chars and then a word or . char. Hence, it cannot match chunks of word or ' chars that have more than 1 dot in them.

    I suggest using

    r"\w[\w'.]*"
    

    See the regex demo and a Regulex graph:

    enter image description here

    Details

    • \w - a word char
    • [\w'.]* - 0 or more word, ' and . chars.