Search code examples
algorithmbinary-treebinary-search-tree

2 dimensions Binary Search Tree and median cut


I was asked a question today to build a BST from set of coordinates, rather than a set of numbers. So the input would be unsorted coordinates

[(1,2), (5,7), (0,5)]

or in other form:

x = [1,5,0] y =[2,7,5]

The heuristic to select a point to a tree is using an axis which has a largest range (median-cut algorithm). Can someone explain why this heuristic is best? We are trying to cut out as many points on each step, so we can have y-axis with a very small range but a lot of points with that range, and x-axis with a wide range, and I just cannot see why this is yielding the best result... If someone could explain this very simply I would be grateful.


Solution

  • Except for the axis selection procedure, your tree is a K-D tree.

    Selecting the axis with the largest range isn't objectively the best heuristic, but it is a good heuristic.

    In this kind of tree, every node has a corresponding rectangular region of space. When you search for, for example, all of the points within a particular radius, you must visit every node with a region that overlaps the target disc.

    The largest-range heuristic tries to make the rectangular regions for each node as square as possible. This prevents nodes with "far away" points from overlapping the target disc, so you don't have to search them. Long, thin, rectangles, on the other hand, overlap a lot of search regions that are far away from most of their points.

    It's optimal when the aspect ratio of the nodes' rectangular regions is the same as the aspect ratio of your search regions.