Search code examples
pythoncoordinatesgeospatialdata-miningkdtree

Reading in Z dimension in KD-Tree


I have been playing around for months on how to best write a program that will analyze multiple tables for similarities in geographical coordinates. I have tried everything now from nested for-loops to currently using a KD-Tree which seems to be working great. However I am not sure it is functioning properly when reading in my 3rd dimension, in this case is defined as Z.

import numpy
from scipy import spatial
import math as ma

def d(a,b):
d = ma.acos(ma.sin(ma.radians(a[1]))*ma.sin(ma.radians(b[1]))
            +ma.cos(ma.radians(a[1]))*ma.cos(ma.radians(b[1]))*(ma.cos(ma.radians((a[0]-b[0])))))
return d

filename1 = "A"
pos1 = numpy.genfromtxt(filename1,
                 skip_header=1,
                 usecols=(1, 2))
z1 = numpy.genfromtxt(filename1,
                 skip_header=1,
                 usecols=(3))
filename2 = "B"
pos2 = numpy.genfromtxt(filename2,
                 #skip_header=1,
                 usecols=(0, 1))
z2 = numpy.genfromtxt(filename2,
                 #skip_header=1,
                 usecols=(2))

filename1 = "A"
data1 = numpy.genfromtxt(filename1,
                 skip_header=1)
                 #usecols=(0, 1))
filename2 = "B"
data2 = numpy.genfromtxt(filename2,
                  skip_header=1)
                  #usecols=(0, 1)
tree1 = spatial.KDTree(pos1)

match = tree1.query(pos2)
#print match
indices_pos1, indices_pos2 = [], []
for idx_pos1 in range(len(pos1)):
    # find indices in pos2 that match this position (idx_pos1)
    matching_indices_pos2 = numpy.where(match[1]==idx_pos1)[0]

    for idx_pos2 in matching_indices_pos2:
        # distance in sph coo
        distance = d(pos1[idx_pos1], pos2[idx_pos2])

        if distance < 0.01 and z1[idx_pos1]-z2[idx_pos2] > 0.001:
            print pos1[idx_pos1], pos2[idx_pos2], z1[idx_pos1], z2[idx_pos2], distance

As you can see I am first calculating the (x,y) position as a single unit measured in spherical coordinates. Each element in file1 is compared to each element in file2. The problem lies somewhere in the Z dimension but I cant seem to crack this issue. When the results are printed out, the Z coordinates are often nowhere near each other. It seems as if my program is entirely ignoring the and statement. Below I have posted a string of results from my data which shows the issue that the z-values are in fact very far apart.

[ 358.98787832   -3.87297365] [ 358.98667162   -3.82408566] 0.694282 0.5310796 0.000853515096105
[ 358.98787832   -3.87297365] [ 359.00303872   -3.8962745 ] 0.694282 0.5132215 0.000484847441066
[ 358.98787832   -3.87297365] [ 358.99624509   -3.84617685] 0.694282 0.5128636 0.000489860962243
[ 359.0065807    -8.81507801] [ 358.99226267   -8.8451829 ] 0.6865379 0.6675241 0.000580562641945
[ 359.0292886     9.31398903] [ 358.99296163    9.28436493] 0.68445694 0.45485374 0.000811677349685

How the out put is structured: [ position1 (x,y)] [position2 (x,y)] [Z1] [Z2] distance

As you can see, specifically in the last example the Z-coordinates are sperated by about .23, which is way over the .001 restriction I typed for it above.

Any insights you could share would be really wonderful!


Solution

  • As for your original problem, you have a simple problem with the sign. You test if z1-z2 > 0.001, but you probably wanted abs(z1-z2) < 0.001 (notice the < instead of a >).

    You could have the tree to also take the z coordinate into account, then you need to give it data as (x,y,z) and not only (x,y). If it doesn't know the z value, it cannot use it.

    It should be possible (although the sklearn API might not allow this) to query the tree directly for a window, where you bound the coordinate range and the z range independently. Think of a box that has different extensions in x,y,z. But because z will have a different value range, combining these different scales is difficult.

    Beware that the k-d-tree does not know about spherical coordinates. A point at +180 degree and one at -180 degree - or one at 0 and one at 360 - are very far for the k-d-tree, but very close by spherical distance. So it will miss some points!