The algorithm build of kd-tree implemented in the Python programming language is as follows (from http://en.wikipedia.org/wiki/K-d_tree):
class Node: pass
def kdtree(point_list, depth=0):
if not point_list:
return None
# Select axis based on depth so that axis cycles through all valid values
k = len(point_list[0]) # assumes all points have the same dimension
axis = depth % k
# Sort point list and choose median as pivot element
point_list.sort(key=lambda point: point[axis])
median = len(point_list) // 2 # choose median
# Create node and construct subtrees
node = Node()
node.location = point_list[median]
node.left_child = kdtree(point_list[:median], depth + 1)
node.right_child = kdtree(point_list[median + 1:], depth + 1)
return node
Sorting is performed on every step. How to reduce the amount of sorting?
It looks like you're only sorting to split around the median. Instead, you could implement a linear-time selection algorithm such as quickselect, then do a linear-time partition of point_list
. Then, you don't need to sort at all anymore.