I'm actually working on high dimensional data (~50.000-100.000 features) and nearest neighbors search must be performed on it. I know that KD-Trees has poor performance as dimensions grows, and also I've read that in general, all space-partitioning data structures tends to perform exhaustive search with high dimensional data.
Additionally, there are two important facts to be considered (ordered by relevance):
So, I need some advice about:
Can I perform NN search in high dimensional space?
No. Because of the curse of dimensionality, data structures that perform Nearest Neighbour search good in lower dimensions, fail to perform well in a high dimensional place. As a matter of fact, the query times becomes almost equally to the brute force one, thus it is worthless.
As a result, in a high dimensional space, one should go for Approximate Nearest Neighbour (ANN) search. To be honest, it's a must.
Which data structure to perform ANN?
I would suggest LSH, or a number of RKD trees. In my answer here, I mention some good libraries that perform ANN in C++. However, notice that LSH solved the R-nearest neighbour problem, thus you specify the parameter R, which is actually the radius. Then, LSH will look for NN's inside that R from the query point, thus you can't really request k NN's.
On the other hand, RKD trees can do that and return you k NN's. I have a project, that builds a forest of RKD trees and performs ANN search in C++, but it targets only high dimensions. It can handle GIST datasets of 10^6 images in 960 dimensions in < 1 sec with about 90% outputs being true nearest neighbours. The name is kd-GeRaF. It will be updated in the next month with a distributed version, but it's already tested and ready to use. It also has a cute logo. :)
I also feel that you should read my answer, which says that the optimal data structure depends on the data.