Search code examples
pythonlisttruncate

Truncate list items


Say there is a list

["element1 abc, 1999 mno", "element2 abc", "element3 xyz, 2019 pqr", "element4 xyz", "element5 xyz, 1999 wasd"]

I want to truncate all the items of the list if a pattern is present. Lets say the pattern is ["xyz, \d{4}", "abc, \d{4}"] The resultant list would be

["element1 abc, 1999", "element2 abc", "element3 xyz, 2019", "element4 xyz", "element5 xyz, 1999"]

Items at position 0, 2 and 4 were truncated because pattern "xyz,"+year and "abc,"+year were encountered.

Below is my current code

import re

ls = ["element1 abc, 1999 mno", "element2 abc", "element3 xyz, 2019 pqr", "element4 xyz", "element5 xyz, 1999 wasd"]
outls = []
pattern = ["xyz, \d{4}", "abc, \d{4}"]

for i in ls:
    appendTracker = 1
    for j in pattern:
        match = re.search(j, i)
        if match:
            matched = match.group()
            appendTracker = 0
            outls.append(i[:i.rfind(matched)+len(matched)])
            break
        
    if appendTracker:
        outls.append(i)

Is there a better way to do this. Time is not big of an issue but I do want to reduce lines of code


Solution

  • As your pattern has a fixed length, the words following the pattern can be found by positive lookbehind (?<=).

    Code

    ls = ["element1 abc", "element2 abc", "element3 xyz, 2019 pqr", "element4 xyz", "element5 xyz, 1999 wasd"]    
    ans = [re.sub(r"(?<=(xyz|abc)\, \d{4}).*", "", s) for s in ls]
    

    Output

    print(ans)
    
    ['element1 abc',
     'element2 abc',
     'element3 xyz, 2019',
     'element4 xyz',
     'element5 xyz, 1999']