Search code examples
pythonpython-2.7pandasfrozenset

Maintaining the order of the elements in a frozen set


I have a list of tuples, each tuple of which contains one string and two integers. The list looks like this:

x = [('a',1,2), ('b',3,4), ('x',5,6), ('a',2,1)]

The list contains thousands of such tuples. Now if I want to get unique combinations, I can do the frozenset on my list as follows:

y = set(map(frozenset, x))

This gives me the following result:

{frozenset({'a', 2, 1}), frozenset({'x', 5, 6}), frozenset({3, 'b', 4})}

I know that set is an unordered data structure and this is normal case but I want to preserve the order of the elements here so that I can thereafter insert the elements in a pandas dataframe. The dataframe will look like this:

 Name  Marks1  Marks2
0    a       1       2
1    b       3       4
2    x       5       6

Solution

  • Instead of operating on the set of frozensets directly you could use that only as a helper data-structure - like in the unique_everseen recipe in the itertools section (copied verbatim):

    from itertools import filterfalse
    
    def unique_everseen(iterable, key=None):
        "List unique elements, preserving order. Remember all elements ever seen."
        # unique_everseen('AAAABBBCCDAABBB') --> A B C D
        # unique_everseen('ABBCcAD', str.lower) --> A B C D
        seen = set()
        seen_add = seen.add
        if key is None:
            for element in filterfalse(seen.__contains__, iterable):
                seen_add(element)
                yield element
        else:
            for element in iterable:
                k = key(element)
                if k not in seen:
                    seen_add(k)
                    yield element
    

    Basically this would solve the issue when you use key=frozenset:

    >>> x = [('a',1,2), ('b',3,4), ('x',5,6), ('a',2,1)]
    
    >>> list(unique_everseen(x, key=frozenset))
    [('a', 1, 2), ('b', 3, 4), ('x', 5, 6)]
    

    This returns the elements as-is and it also maintains the relative order between the elements.