Search code examples
pythonpython-2.7subsetpython-itertools

Fastest way to remove subsets of lists from a list in Python


Suppose I have a list of lists like the one below (the actual list is much longer):

fruits = [['apple', 'pear'],
          ['apple', 'pear', 'banana'],
          ['banana', 'pear'],
          ['pear', 'pineapple'],
          ['apple', 'pear', 'banana', 'watermelon']]

In this case, all the items in the lists ['banana', 'pear'], ['apple', 'pear'] and ['apple', 'pear', 'banana'] are contained in the list ['apple', 'pear', 'banana', 'watermelon'] (the order of items does not matter), so I would like to remove ['banana', 'pear'], ['apple', 'pear'], and ['apple', 'pear', 'banana'] as they are subsets of ['apple', 'pear', 'banana', 'watermelon'].

My current solution is shown below. I first use ifilter and imap to create a generator for the supersets that each list might have. Then for those cases that do have supersets, I use compress and imap to drop them.

from itertools import imap, ifilter, compress

supersets = imap(lambda a: list(ifilter(lambda x: len(a) < len(x) and set(a).issubset(x), fruits)), fruits)


new_list = list(compress(fruits, imap(lambda x: 0 if x else 1, supersets)))
new_list
#[['pear', 'pineapple'], ['apple', 'pear', 'banana', 'watermelon']]

I wonder if there are more efficient ways to do this?


Solution

  • I don't know if it is faster but this is easier to read (to me anyway):

    sets={frozenset(e) for e in fruits}  
    us=set()
    while sets:
        e=sets.pop()
        if any(e.issubset(s) for s in sets) or any(e.issubset(s) for s in us):
            continue
        else:
            us.add(e)   
    

    Update

    It is fast. Faster still is to use a for loop. Check timings:

    fruits = [['apple', 'pear'],
            ['apple', 'pear', 'banana'],
            ['banana', 'pear'],
            ['pear', 'pineapple'],
            ['apple', 'pear', 'banana', 'watermelon']]
    
    from itertools import imap, ifilter, compress    
    
    def f1():              
        sets={frozenset(e) for e in fruits}  
        us=[]
        while sets:
            e=sets.pop()
            if any(e.issubset(s) for s in sets) or any(e.issubset(s) for s in us):
                continue
            else:
                us.append(list(e))   
        return us           
    
    def f2():
        supersets = imap(lambda a: list(ifilter(lambda x: len(a) < len(x) and set(a).issubset(x), fruits)), fruits)
        new_list = list(compress(fruits, imap(lambda x: 0 if x else 1, supersets)))
        return new_list
    
    def f3():
        return filter(lambda f: not any(set(f) < set(g) for g in fruits), fruits)
    
    def f4():              
        sets={frozenset(e) for e in fruits}  
        us=[]
        for e in sets:
            if any(e < s for s in sets):
                continue
            else:
                us.append(list(e))   
        return us              
    
    if __name__=='__main__':
        import timeit     
        for f in (f1, f2, f3, f4):
            print f.__name__, timeit.timeit("f()", setup="from __main__ import f, fruits"), f()  
    

    On my machine on Python 2.7:

    f1 8.09958791733 [['watermelon', 'pear', 'apple', 'banana'], ['pear', 'pineapple']]
    f2 15.5085151196 [['pear', 'pineapple'], ['apple', 'pear', 'banana', 'watermelon']]
    f3 11.9473619461 [['pear', 'pineapple'], ['apple', 'pear', 'banana', 'watermelon']]
    f4 5.87942910194 [['watermelon', 'pear', 'apple', 'banana'], ['pear', 'pineapple']]