I'm using FLANN in matlab and using SIFT feature descriptor as my data. There is a function:
[result, ndists] = flann_search(index, testset, ...);
Here the index
is built with kd-tree. The "user manual" said result
returns the nearest neighbors of the samples in testset
, and ndists
contains the corresponding distances between the test samples and the nearest neighbors. I used the euclidean distance and found that the distances in ndists
are different from that of computed by the orignal data. And even more strange, all the numbers in ndists
are integers, which is often not possible for euclidean distance. Can you help me to explain this?
FLANN by default returns squared euclidean distance (x12 + ... + xn2). You can change the used metric with flann_set_distance_type(type, order)
(see manual).
An example:
from pyflann import *
import numpy as np
dataset = np.array(
[[1., 1, 1, 2, 3],
[10, 10, 10, 3, 2],
[100, 100, 2, 30, 1]
])
testset = np.array(
[[1., 1, 1, 1, 1],
[90, 90, 10, 10, 1]
])
result, dists = FLANN().nn(
dataset, testset, 1, algorithm="kmeans", branching=32, iterations=7, checks=16)
Output:
>>> result
array([0, 2], dtype=int32)
>>> dists
array([ 5., 664.])
>>> ((testset[0] - dataset[0])**2).sum()
5.0
>>> ((testset[1] - dataset[2])**2).sum()
664.0
SIFT features are integers so the resulting distances are also integers in case of the squared euclidean distance.