I've seen people say that set
objects in python have O(1) membership-checking. How are they implemented internally to allow this? What sort of data structure does it use? What other implications does that implementation have?
Every answer here was really enlightening, but I can only accept one, so I'll go with the closest answer to my original question. Thanks all for the info!
According to this thread:
Indeed, CPython's sets are implemented as something like dictionaries with dummy values (the keys being the members of the set), with some optimization(s) that exploit this lack of values
So basically a set
uses a hashtable as its underlying data structure. This explains the O(1)
membership checking, since looking up an item in a hashtable is an O(1)
operation, on average.
If you are so inclined you can even browse the CPython source code for set
which, according to Achim Domma, was originally mostly a cut-and-paste from the dict
implementation.
Note: Nowadays, set
and dict
's implementations have diverged significantly, so the precise behaviors (e.g. arbitrary order vs. insertion order) and performance in various use cases differs; they're still implemented in terms of hashtables, so average case lookup and insertion remains O(1)
, but set
is no longer just "dict
, but with dummy/omitted values".