Search code examples
pythoncpythonfrozenset

How is frozenset equality implemented in Python?


How is frozenset equality implemented in CPython? In particular, I want to know how individual elements in the fronzenset are compared with each other and their total time complexity.

I took a look at set and frozenset difference in implementation and https://stackoverflow.com/a/51778365/5792721.

The answer in the second link covers steps before comparing individual elements in the set, but missing the actual comparison algorithm.


Solution

  • The implementation in question can be found here. The algorithm goes something like this, given that we want to check if two sets v and w are equal:

    1. Check if their sizes are equal. If they are not, return false.

    This references the size attribute of the PySetObject, so it's a constant time operation.

    1. Check if both sets are frozensets. If so, and their hashes are different, return false.

    Again, this references the hash attribute of the PySetObject, which is a constant time operation. This is possible because frozensets are immutable objects and hence have their hashes calculated when they are created.

    1. Check if w is a subset of v.

    This is done by iterating through each element of v and checking if it is present in w.

    The iteration must be performed over all of v, and is therefore at worst linear in its size (if a differing element is found in the last position).

    Membership testing for sets in general takes constant time in the average case and linear time in the worst case; combining the two gives time linear in the size of v in the average case and time proportional to len(v) * len(w) in the worst case.

    This is, in a sense, the average case of the worst case, since it assumes that the first two checks always return true. If your data only rarely tends to have equal-size sets which also have the same hashes but contain different objects, then in the average case the comparison will still run in constant time.