I am trying to solve a problem that is a part of my genome alignment project. The problem goes as follows: if given a nested list
y = [[1,2,3],[1,2,3],[3,4,5],[6,5,4],[4,2,5],[4,2,5],[1,2,8],[1,2,3]]
extract indices of unique lists into a nested list again.
For example, the output for the above nested list should be
[[0,1,7],[2],[3],[4,5],[6]]
.
This is because list [1,2,3]
is present in 0,1,7th
index positions, [3,4,5]
in 2nd index position and so on.
Since I will be dealing with large lists, what could be the most optimal way of achieving this in Python?
You could create an dictionary (or OrderedDict if on older pythons). The keys of the dict will be tuples of the sub-lists and the values will be an array of indexes. After looping through, the dictionary values will hold your answer:
from collections import OrderedDict
y = [[1,2,3],[1,2,3],[3,4,5],[6,5,4],[4,2,5],[4,2,5],[1,2,8],[1,2,3]]
lookup = OrderedDict()
for idx,l in enumerate(y):
lookup.setdefault(tuple(l), []).append(idx)
list(lookup.values())
# [[0, 1, 7], [2], [3], [4, 5], [6]]