algorithm scipy binary-search-tree kdtree

KD Tree: meaning of `leafsize` parameter

I am looking at SciPy's KD Tree implementation. I am confused by the leafsize parameter, which is said to be

The number of points at which the algorithm switches over to brute-force. Default: 16.

Is this the number of leaves in the BST? If so, that means by default, the KD tree will contain no more than 32 points. This seems unreasonably small, especially for my use case of k=2. Am I interpreting the parameter incorrectly?

Solution

The parameter leafsize does not control the maximum number of leaves in the tree (which isn't bounded), but rather the number of input points associated with a single leaf in the tree. Consider 64 points being added to a tree. If each point gets associated with a single leaf, you will get a full tree that is seven levels deep and any processing requiring a point will descend these seven levels to find it. Alternatively, if the leafsize is 16, you will get a tree that is only 3 levels deep and each leaf in the tree is associated with 16 points. Processing a point involves descending three levels of the tree and then testing each of the 16 points in the leaf (since they are unordered). This value is can be tuned for performance, and empirically, the best value depends on how exactly the tree is being used.

Conceptually, here is an example of the trees that would be built for different leafsize values.

You can see how the leafsize parameter short-circuits the tree building process in the scipy source code. This will leave between leafsize/2 and leafsize points unordered in the leafs of the tree.