Search code examples
pythonlistindexingsetlookup

Get the index of the set containing a character from a list, for every character, in less than O(n**2)


I have two elements, x and Y. x is a very long list of characters. Y is an even longer list, such that every element of it is a set. For every element in x, I would like to find the index of the set in Y that contains it.

  • The sets are disjoint - there is no element that appears in more than one set.
  • All the elements from x are covered by the sets in Y.
  • There is not a y in Y such that it contains an element that is not on x.
Y = [{"a", "b"}, {"c", "d"}, {"e", "f"}]
x = ["a", "b", "c", "d", "e", "f", "a", "f"]

# Desired results:
[('a', 0), 
 ('b', 0), 
 ('c', 1), 
 ('d', 1),
 ('e', 2), 
 ('f', 2),
 ('a', 0),
 ('f', 2)]

Frankly, I am ok with any shape of such results, as long as the computation is efficient.


Solution

  • You can create an index first to speed up the search:

    Y = [{"a", "b"}, {"c", "d"}, {"e", "f"}]
    x = ["a", "b", "c", "d", "e", "f", "a", "f"]
    
    index = {v: i for i, s in enumerate(Y) for v in s}
    out = [(v, index[v]) for v in x]
    print(out)
    

    Prints:

    [("a", 0), ("b", 0), ("c", 1), ("d", 1), ("e", 2), ("f", 2), ("a", 0), ("f", 2)]