Search code examples
python-3.xlistnestedbigdataunique

How to get unique values in nested list along single column?


I need to extract only unique sublists based on first element from a nested list. For e.g.

in = [['a','b'], ['a','d'], ['e','f'], ['g','h'], ['e','i']]
out = [['a','b'], ['e','f'], ['g','h']]

My method is two break list into two lists and check for elements individually.

lis = [['a','b'], ['a','d'], ['e','f'], ['g','h']]
lisa = []
lisb = []
for i in lis:
    if i[0] not in lisa:
        lisa.append(i[0])
        lisb.append(i[1])
out = []
for i in range(len(lisa)):
    temp = [lisa[i],lisb[i]]
    out.append(temp)

This is an expensive operation when dealing with list with 10,00,000+ sublists. Is there a better method?


Solution

  • Use memory-efficient generator function with an auziliary set object to filter items on the first unique subelement (take first unique):

    def gen_take_first(s):
        seen = set()
        for sub_l in s:
            if sub_l[0] not in seen:
                seen.add(sub_l[0])
                yield sub_l
    
    inp = [['a','b'], ['a','d'], ['e','f'], ['g','h'], ['e','i']]
    out = list(gen_take_first(inp))
    print(out)
    

    [['a', 'b'], ['e', 'f'], ['g', 'h']]