I have a long string like this and I want to extract all items after Invalid items
, so I expect regex returns a list like
['abc.def.com', 'bar123', 'hello', 'world', '1212', '5566', 'aaaa']
I tried using this pattern but it gives me one group per match
import re
test = 'Valid items: (aaa.com; bbb.com); Invalid items: (abc.def.com;); Valid items: (foo123;); Invalid items: (bar123;); Valid items: (1234; 5678; abcd;); Invalid items: (hello; world; 1212; 5566; aaaa;)'
re.findall(r'Invalid items: \((.+?);\)', test)
# ['abc.def.com', 'bar123', 'hello; world; 1212; 5566; aaaa']
Is there a better way to do this with regex?
thanks
If you want to return all the matches individually using only a single findall
, then you'll need to make use of positive lookbehind, e.g. (?<=foo)
. Python module re
unfortunately only supports fixed-width lookbehind. However, if you're willing to use the outstanding regex module, then it can be done.
Regex:
(?<=Invalid items: \([^)]*)[^ ;)]+
Demonstration: https://regex101.com/r/p90Z81/1
If there can be empty items, a small modification to the regex allows capture of these zero-width matches, as follows:
(?<=Invalid items: \([^)]*)(?:[^ ;)]+|(?<=\(| ))