Indices of duplicate lists in a nested list

I am trying to solve a problem that is a part of my genome alignment project. The problem goes as follows: if given a nested list

y = [[1,2,3],[1,2,3],[3,4,5],[6,5,4],[4,2,5],[4,2,5],[1,2,8],[1,2,3]]

extract indices of unique lists into a nested list again.

For example, the output for the above nested list should be

[[0,1,7],[2],[3],[4,5],[6]].

This is because list [1,2,3] is present in 0,1,7th index positions, [3,4,5] in 2nd index position and so on.

Since I will be dealing with large lists, what could be the most optimal way of achieving this in Python?

Solution

You could create an dictionary (or OrderedDict if on older pythons). The keys of the dict will be tuples of the sub-lists and the values will be an array of indexes. After looping through, the dictionary values will hold your answer:

from collections import OrderedDict

y = [[1,2,3],[1,2,3],[3,4,5],[6,5,4],[4,2,5],[4,2,5],[1,2,8],[1,2,3]]

lookup = OrderedDict()
for idx,l in enumerate(y):
    lookup.setdefault(tuple(l), []).append(idx)

list(lookup.values())
# [[0, 1, 7], [2], [3], [4, 5], [6]]