How is frozenset equality implemented in CPython? In particular, I want to know how individual elements in the fronzenset are compared with each other and their total time complexity.
I took a look at set and frozenset difference in implementation and https://stackoverflow.com/a/51778365/5792721.
The answer in the second link covers steps before comparing individual elements in the set, but missing the actual comparison algorithm.
The implementation in question can be found here. The algorithm goes something like this, given that we want to check if two sets
v
and w
are equal:
false
. This references the size
attribute of the PySetObject
, so it's a constant time operation.
sets
are frozensets
. If so, and their hashes are different, return false
.Again, this references the hash
attribute of the PySetObject
, which is a constant time operation. This is possible because frozensets
are immutable objects and hence have their hashes calculated when they are created.
w
is a subset of v
.This is done by iterating through each element of v
and checking if it is present in w
.
The iteration must be performed over all of v
, and is therefore at worst linear in its size (if a differing element is found in the last position).
Membership testing for sets
in general takes constant time in the average case and linear time in the worst case; combining the two gives time linear in the size of v
in the average case and time proportional to len(v) * len(w)
in the worst case.
This is, in a sense, the average case of the worst case, since it assumes that the first two checks always return true
. If your data only rarely tends to have equal-size sets
which also have the same hashes but contain different objects, then in the average case the comparison will still run in constant time.