My algorithm needs to iteratively shrink a set by removing an element, and do something with the element removed and with the shrinking set in each iteration. And:
By the way, the algorithm is the basic form of the Bron–Kerbosch algorithm. Smarter versions of that algorithm work faster (mostly) because they don't leave the choice of element arbitrary and I'd like to learn how much that effort pays off, compared to optimizing the pop operation.
Python sets have a pop
member that pretty much does that. In Scala and Go, picking and removing the "first" element of a hash set seems to work fine (where "first" corresponds to the iterator). In Rust, that is something like:
// split off an arbitrary element from a (non-empty) set
pub fn pop<T>(set: &mut HashSet<T>) -> T
where
T: Eq + Clone + std::hash::Hash,
{
let elt = set.iter().next().cloned().unwrap();
set.remove(&elt);
elt
}
This seems to be a performance bottleneck compared to other languages, but even to a seemingly naive way do this kind of iteration in Rust: copying the sequence, then popping elements in sequence. I benchmarked some implementations of a pop-like function on the playground but none perform well compared to the naive way.
At first, I thought I saw that removing an element is not expensive, but picking one with iter().next()
is. But on closer examination, that's not true, at least compared to other languages (*).
Using retain
understandably doesn't help: it always iterates the whole set. Are there any other alternatives?
(*) On closer examination, iter().next()
is quite cheap, as far as microbenchmarking can be trusted. Separate microbenchmarks say that picking an arbitrary element from a set costs (in nanoseconds on my system):
| Type of set | Number of elements in set instance
| | 100 | 10,000 | 1,000,000
| Rust HashSet | 2 | 2 | 2
| Rust BTreeSet | 11 | 12 | 13
| Go map[]struct{} | 27 | 31 | 94
| Python set | 125 | 125 | 125
I guess the same advice applies as in Can I randomly sample from a HashSet efficiently?: copy the set as a vector just to iterate over it as shown in the "sequenced" solution in the benchmark:
let seq: Vec<u32> = set.iter().cloned().collect();
for elt in seq {
set.remove(&elt);
That means this answer is not applicable if you need to shrink the set (pick an arbitrary element) only once or a few times, or if the set contents cannot be cheaply cloned.