Say there is a list
["element1 abc, 1999 mno", "element2 abc", "element3 xyz, 2019 pqr", "element4 xyz", "element5 xyz, 1999 wasd"]
I want to truncate all the items of the list if a pattern is present.
Lets say the pattern is ["xyz, \d{4}", "abc, \d{4}"]
The resultant list would be
["element1 abc, 1999", "element2 abc", "element3 xyz, 2019", "element4 xyz", "element5 xyz, 1999"]
Items at position 0, 2 and 4 were truncated because pattern "xyz,"+year
and "abc,"+year
were encountered.
Below is my current code
import re
ls = ["element1 abc, 1999 mno", "element2 abc", "element3 xyz, 2019 pqr", "element4 xyz", "element5 xyz, 1999 wasd"]
outls = []
pattern = ["xyz, \d{4}", "abc, \d{4}"]
for i in ls:
appendTracker = 1
for j in pattern:
match = re.search(j, i)
if match:
matched = match.group()
appendTracker = 0
outls.append(i[:i.rfind(matched)+len(matched)])
break
if appendTracker:
outls.append(i)
Is there a better way to do this. Time is not big of an issue but I do want to reduce lines of code
As your pattern has a fixed length, the words following the pattern can be found by positive lookbehind (?<=)
.
Code
ls = ["element1 abc", "element2 abc", "element3 xyz, 2019 pqr", "element4 xyz", "element5 xyz, 1999 wasd"]
ans = [re.sub(r"(?<=(xyz|abc)\, \d{4}).*", "", s) for s in ls]
Output
print(ans)
['element1 abc',
'element2 abc',
'element3 xyz, 2019',
'element4 xyz',
'element5 xyz, 1999']