Search code examples
pythonregexlistreplacenested-lists

Remove a particular word in a list of lists if it appears after a set of words


I have a list of words list1=['duck','crow','hen','sparrow'] and a list of sentences list2=[['The crow eats'],['Hen eats blue seeds'],['the duck is cute'],['she eats veggies']] I want to remove every occurance of the word 'eats' if it appears exactly after any of the words from the list.

desired output= [['The crow','Hen blue seeds','the duck is cute'],['she eats veggies']]

def remove_eats(test):
  for i in test:
    for j in i:
     for word in list1:
        j=j.replace(word + " eats", word)
        print(j)
        break

remove_eats(list2)

The replace method is not really working for the strings. Could you help me out? Is it possble with Regex?


Solution

  • You can use a regex such as the below, which has a series of alternating positive look-behinds. (Demo)

    (?:(?<=[Dd]uck)|(?<=[Cc]row)|(?<=[Hh]en)|(?<=[Ss]parrow))\s+eats?\b
    

    Python example, using builtin re module:

    import re
    
    list1 = ['duck', 'crow', 'hen', 'sparrow']
    
    look_behinds = '|'.join(f'(?<=[{w[0].swapcase()}{w[0]}]{w[1:]})'
                            for w in list1)
    
    EATS_RE = re.compile(rf'(?:{look_behinds})\s+eats?\b')
    
    sentences = [['The crow eats'],
                 ['Hen eats blue seeds'],
                 ['the duck is cute'],
                 ['she eats veggies']]
    
    repl_sentences = [[EATS_RE.sub('', s, 1) for s in x] for x in sentences]
    print(repl_sentences)
    

    Out:

    [['The crow'], ['Hen blue seeds'], ['the duck is cute'], ['she eats veggies']]