time-complexity big-o discrete-mathematics quadtree aabb

Complexity of quad tree's range search

I've already searched for this information. Found this:

https://groups.google.com/forum/#!topic/uw.cs.cs240/MGfrsvKAiMA

and this:

How can worst case complexity of quad tree O(N)?

, but I don't believe it answers my problem. The first one, perhaps, but I don't understand the explanation. Up to the point.

I've got a quad tree over a discrete space (2d square table of entities). It is a region tree, as explained in english Wikipedia page (https://en.wikipedia.org/wiki/Quadtree#Types). Each region can hold only one entity. Each entity has its discrete coordinates.

I have implemented a method to find all entities in certain (discrete) AABB, working exactly as the queryRange() function in the aforementioned Wiki page.

My question is: what is the time complexity for this queryRange() function?

I've tried figuring it out myself, but it seems to depend on many various factors, such as: depth of the tree, number of elements in the tree, size of given AABB. I think in its core it is related to number of sub-trees visited by queryRange() recursion.

Also I'd be thankful for any credible sources on that. I'm writing a master thesis and I need citations. I really can't seem to google anything good on this topic though.

Solution

The time complexity of queries depends on many aspects:

The data distribution
The order of insertion (unless you rebalance the tree)
Whether you store points or rectangles (points are much more common as far as I can tell)
The type of queries you are performing

I'm no expert on complexity calculation, so there may be some way how average complexity can be calculated, but it may not be useful to a given real scenario.

Maximum complexity are a bit easier. Lets say n is the number of entries and h the heights of the tree.

A large query will obviously return all elements, so the complexity is O(n). A very narrow query will have to traverse from root to exactly one leaf, hence O(h). This gives combined O(n+h).

There are some border cases: If the data is, for example, shaped such that all points are on one line (0,1),(0,2),(0,3),(0,4),... and they are inserted in order, then the tree may degenerate to a list such that h=n, unless you perform some rebalancing.

If you are interested in quadtree variants, you may want to look at the recent PH-Tree. The PDF contains some complexity calculations. The PH-Tree is a region quadtree (decomposition into quadrants) based on bit-level-trie decomposition. The maximum depth is therefore equal to the number of bits in the values, usually 32 or 64. The structure is independent of the insertion order, so there is no rebalancing (nor is it required, because depth is limited). Some technical updates can be found here. (Disclaimer: This is based on my own research when I was at the University).