Search code examples
pythonpython-2.7duplicatesduplicate-datapython-itertools

Duplicate Removal in a list of list


[
[0.074, 0.073, 0.072, 0.03, 0.029, 0.024, 0.021, 0.02], 
[0.02, 0.02, 0.015], 
[0.026, 0.026, 0.02, 0.02, 0.02, 0.015], 
[0.021, 0.021, 0.02, 0.017], [0.077, 0.076, 0.074, 0.055, 0.045, 0.021], 
[0.053, 0.052, 0.051, 0.023, 0.022], 
[0.016, 0.016]
]

The above is a output from a list of list, data['stock'].

I am thinking of removing the duplicate content within each sub-list but can't figure out a way to do it. If you take a look at line 3, you will notice that there are three elements (0.02, 0.02 and 0.015). However, the first 2 elements are actually duplicate and so one of the element is redundant.

Is there a way I could do a check in each sub-list to get rid of the duplicate value while preserving the order?

Please advise!


Solution

  • Looks like the sublists are already sorted, so you can apply itertools.groupby:

    In [1]: data = [
       ...: [0.074, 0.073, 0.072, 0.03, 0.029, 0.024, 0.021, 0.02], 
       ...: [0.02, 0.02, 0.015], 
       ...: [0.026, 0.026, 0.02, 0.02, 0.02, 0.015], 
       ...: [0.021, 0.021, 0.02, 0.017], [0.077, 0.076, 0.074, 0.055, 0.045, 0.021], 
       ...: [0.053, 0.052, 0.051, 0.023, 0.022], 
       ...: [0.016, 0.016]
       ...: ]
    
    In [2]: from itertools import groupby
    
    In [3]: [[k for k, g in groupby(subl)] for subl in data]
    Out[3]: 
    [[0.074, 0.073, 0.072, 0.03, 0.029, 0.024, 0.021, 0.02],
     [0.02, 0.015],
     [0.026, 0.02, 0.015],
     [0.021, 0.02, 0.017],
     [0.077, 0.076, 0.074, 0.055, 0.045, 0.021],
     [0.053, 0.052, 0.051, 0.023, 0.022],
     [0.016]]