Search code examples
pythonstringtokenize

Removes token beginning with specific characters


Hi I am trying to remove all those token which are contained within the my predefined list (prefixes). Below is my code and is not removing the tokens.

prefixes = ('#', '@')
tokens = [u'order', u'online', u'today', u'ebay', u'store', u'#hamandcheesecroissant', u'#whoopwhoop', u'\u2026']

for token in tokens:
    if token.startswith(prefixes):
       tokens.remove(token)

Solution

  • It doesn't really work to remove items from a list whilst iterating over it.

    You can use a list comprehension

    tokens = [token for token in tokens if not token.startswith(prefixes)]
    

    Or create another list, and then append the items you want to keep to that list instead:

    new_tokens = []
    
    for token in tokens:
        if not token.startswith(prefixes):
           new_tokens.append(token)