I'm trying to return both count
(the number of neighbors) and ind
(indices of said neighbors) but I can't unless I call query_radius
twice, which although computationally intensive, is actually faster for me in Python than iterating through and counting the sizes of each row in ind
! This seems terribly inefficient so I wonder is there a way to just return them both in one call?
I tried to access the count and ind objects of tree
after calling query_radius
but it doesn't exist. There's no efficient way to do this in numpy, is there?
>>> array = np.array([[1,2,3], [2,3,4], [6,2,3]])
>>> tree = KDTree(array)
>>> neighbors = tree.query_radius(array, 1)
>>> tree.ind
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'sklearn.neighbors.kd_tree.KDTree' object has no attribute 'ind'
>>> tree.count
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'sklearn.neighbors.kd_tree.KDTree' object has no attribute 'count'
Consider this dataset:
array = np.random.random((10**5, 3))*10
tree = KDTree(array)
There are 3 options as you identify in your question:
1) Call tree.query_radius
twice to get the neighbors and their counts.
neighbors = tree.query_radius(array, 1)
counts = tree.query_radius(array, 1, count_only=1)
This takes 8.347s.
2) Get only the neighbors and then get the counts by iterating through them:
neighbors = tree.query_radius(array, 1)
counts = []
for i in range(len(neighbors)):
counts.append(len(neighbors[i]))
This is significantly faster than the first method and takes 4.697s
3) Now, we can improve the looping time to calculate counts
.
neighbors = tree.query_radius(array, 1)
len_array = np.frompyfunc(len, 1, 1)
counts = len_array(neighbors)
This is the fastest with 4.449s.