Search code examples
pythonsortingsortedsetsortedcontainers

How to get SortedSet to update position of old value?


I have the following object that I would like to keep in a container that is sorted on insertion and does not contain duplicates, so I am using a SortedSet

from sortedcontainers import SortedSet, SortedList


class R():

    def __hash__(self):
        return hash(self.person_id)

    def __eq__(self, other):
        return self.__class__ == other.__class__ and self.person_id == other.person_id

    def __nq__(self, other):
        return not (self == other)

    def __lt__(self, other):
        return other.value < self.value

    def __init__(self, person_id, value):
        self.person_id = person_id
        self.value = value

    def __repr__(self):
        return "person: %s (%s)" % (self.person_id, self.value)

x = SortedSet()

x.add(R(13, 2))
x.add(R(17, 4))
x.add(R(11, 21))
x.add(R(7, -41))

print(x)

When I run this code I get the following output as expected:

SortedSet([person: 11 (21), person: 17 (4), person: 13 (2), person: 7 (-41)])

However if I added an extra duplicate element i.e. 17:

x.add(R(13, 2))
x.add(R(17, 4))
x.add(R(11, 21))
x.add(R(7, -41))
x.add(R(17, -67))

print(x)

I expect the R object with id 17 named person: 17 (4) to be moved to the back with value person: 17 (-67) like:

SortedSet([person: 11 (21), person: 13 (2), person: 7 (-41), person: 17 (-67)])

However nothing changes:

SortedSet([person: 11 (21), person: 17 (4), person: 13 (2), person: 7 (-41)])

How can I achieve the desired output as described using a SortedSet or any other container that is sorted on insertion and has no duplicates?


Solution

  • DeepSpace's answer covers making this work (if somewhat inefficiently), but I'm going to issue a frame challenge here: This is a bad design.

    Sets (the logical construct) are designed to store unique items. If something added to a set is equal to something already in it, there's no reason to replace the old item, because the old item and the new item are equivalent. If your class does not use a definition of equality in which equality implies substitutability (two equal instances could be use interchangably in all relevant ways), then the instances aren't suitable for use in a set. Even without SortedSet getting involved, using plain set, this wouldn't work, because set.add won't replace an item when you insert an "equal" item; they're both equivalent after all, so why do the extra work?

    When you need to have a concept of keys that can map to values, where the values for a given key can be changed later without knowing the original value, you want a mapping (dict-like), not a set (set-like).

    Kelly Bundy suggests that what you want may already exist in the sortedcollections package (ValueSortedDict), so I'd go with that if it works. Since sortedcontainers contains nothing that allows replacing values and sorting on the values, you'd have to go to a lot of work to add that behavior, roughly the same order of magnitude as implementing it yourself from scratch.


    Additional notes on why this doesn't work:

    On top of your use case being fundamentally unsuited to sets (the logical concept, not merely set itself), SortedSet itself is unusually inappropriate for your class, because implicitly relies on two invariants (only one of which is strictly required by Python, though the other is usually adhered to):

    1. Required by Python: __eq__ should be consistent with __hash__: If two items are equal, they must have the same hash, and, as much as possible, two unequal items should not have the same hash (ideally the hash should be based on the same fields __eq__ compares, but it's legal to base it on a subset of those fields)
    2. Required by SortedSet (and often assumed by other things that work with sorted objects): __eq__ should be consistent with __lt__ (and all other rich comparison operators): If a == b, then a < b and b < a should both be false; similarly, if a < b or b < a is true, then a != b. Most sorting stuff in Python sticks to __lt__ comparisons exclusively to allow inconsistent definitions, but if you stick the same objects in a tuple for comparisons, suddenly the lexicographic ordering rules means tuple's own __lt__ implementation relies on both the __lt__ and __eq__ of your class, so in practice you want them consistent anyway.

    Your class violates #2; the sorting rules are completely unrelated to the definition of equality. SortedSet muddles along here, determining uniqueness based on __hash__+__eq__ and ordering with __lt__, but in certain circumstances (e.g. when removing elements) it relies on __lt__ being consistent with __eq__. Specifically, after removing from the internal set (using __hash__+__eq___) it then removes from the internal SortedList, which bisects to find the element to remove, using __lt__, and confirms it found the right element with an equality check, using __eq__. Since __eq__ and __lt__ are inconsistent (they'd only match if you were trying to remove an R with the same person_id and value), this never finds the value it's trying to remove, and it raises an exception.