Search code examples
c++algorithmsortingdata-structurescounting

Is there a data structure like a C++ std set which also quickly returns the number of elements in a range?


In a C++ std::set (often implemented using red-black binary search trees), the elements are automatically sorted, and key lookups and deletions in arbitrary positions take time O(log n) [amortised, i.e. ignoring reallocations when the size gets too big for the current capacity].

In a sorted C++ std::vector, lookups are also fast (actually probably a bit faster than std::set), but insertions are slow (since maintaining sortedness takes time O(n)).

However, sorted C++ std::vectors have another property: they can find the number of elements in a range quickly (in time O(log n)).

i.e., a sorted C++ std::vector can quickly answer: how many elements lie between given x,y?

std::set can quickly find iterators to the start and end of the range, but gives no clue how many elements are within.

So, is there a data structure that allows all the speed of a C++ std::set (fast lookups and deletions), but also allows fast computation of the number of elements in a given range?

(By fast, I mean time O(log n), or maybe a polynomial in log n, or maybe even sqrt(n). Just as long as it's faster than O(n), since O(n) is almost the same as the trivial O(n log n) to search through everything).

(If not possible, even an estimate of the number to within a fixed factor would be useful. For integers a trivial upper bound is y-x+1, but how to get a lower bound? For arbitrary objects with an ordering there's no such estimate).

EDIT: I have just seen the related question, which essentially asks whether one can compute the number of preceding elements. (Sorry, my fault for not seeing it before). This is clearly trivially equivalent to this question (to get the number in a range, just compute the start/end elements and subtract, etc.)

However, that question also allows the data to be computed once and then be fixed, unlike here, so that question (and the sorted vector answer) isn't actually a duplicate of this one.


Solution

  • The data structure you're looking for is an Order Statistic Tree

    It's typically implemented as a binary search tree in which each node additionally stores the size of its subtree.

    Unfortunately, I'm pretty sure the STL doesn't provide one.