Search code examples
pythonperformancenested-listsindices

Indices of duplicate lists in a nested list


I am trying to solve a problem that is a part of my genome alignment project. The problem goes as follows: if given a nested list

y = [[1,2,3],[1,2,3],[3,4,5],[6,5,4],[4,2,5],[4,2,5],[1,2,8],[1,2,3]]

extract indices of unique lists into a nested list again.

For example, the output for the above nested list should be

[[0,1,7],[2],[3],[4,5],[6]].

This is because list [1,2,3] is present in 0,1,7th index positions, [3,4,5] in 2nd index position and so on.

Since I will be dealing with large lists, what could be the most optimal way of achieving this in Python?


Solution

  • You could create an dictionary (or OrderedDict if on older pythons). The keys of the dict will be tuples of the sub-lists and the values will be an array of indexes. After looping through, the dictionary values will hold your answer:

    from collections import OrderedDict
    
    y = [[1,2,3],[1,2,3],[3,4,5],[6,5,4],[4,2,5],[4,2,5],[1,2,8],[1,2,3]]
    
    lookup = OrderedDict()
    for idx,l in enumerate(y):
        lookup.setdefault(tuple(l), []).append(idx)
    
    list(lookup.values())
    # [[0, 1, 7], [2], [3], [4, 5], [6]]