Search code examples
pythonstringliststartswith

remove later strings starting with a certain thing in a list python


I have a list like this:

['a b d', 'a b e', 'c d j', 'w x y', 'w x z', 'w x k']

I want to remove all of the strings that occur after a string that starts with the same 4 characters as it. For example, 'a b e' would be removed because 'a b d' occurs before it.

The new list should look like this:

['a b d', 'c d j', 'w x y']

How can I do this?

(NOTE: The list is sorted, as per @Martijn Pieters' comment)


Solution

  • Using a generator function to remember the starts:

    def remove_starts(lst):
        seen = []
        for elem in lst:
            if elem.startswith(tuple(seen)):
                continue
            yield elem
            seen.append(elem[:4])
    

    So the function skips anything that starts with one of the strings in seen, adding the first 4 characters of anything it does allow through to that set.

    Demo:

    >>> lst = ['a b d', 'a b e', 'c d j', 'w x y', 'w x z', 'w x k']
    >>> def remove_starts(lst):
    ...     seen = []
    ...     for elem in lst:
    ...         if elem.startswith(tuple(seen)):
    ...             continue
    ...         yield elem
    ...         seen.append(elem[:4])
    ...
    >>> list(remove_starts(lst))
    ['a b d', 'c d j', 'w x y']
    

    If your input is sorted, this can be simplified to:

    def remove_starts(lst):
        seen = ()
        for elem in lst:
            if elem.startswith(seen):
                continue
            yield elem
            seen = elem[:4]
    

    This saves on prefix-testing by limiting to just the last one.