Search code examples
pythonnumpykdtree

scipy.spatial ValueError when trying to run kdtree


Background: I am trying to run a nearest neighbor using the cKDtree function on a shapefile that has 201 records with lat/lons against a time series dataset of 8760 hours (total hours in a year). I am getting an error, naturally I looked it up. I found this: scipy.spatial ValueError: "x must consist of vectors of length %d but has shape %s" which is the same error, but I am having trouble understanding how exactly this error was resolved.

Workflow: I pulled the x & y coordinates out of the shapefile and stored them in separate arrays called x_vector and y_vector. The 8760 data is an hdf5 file. I pulled the coordinates out using h5_coords = np.vstack([meta['latitude'], meta['longitude']]).T.

Now I try to run the kdtree,

# Run the kdtree to match nearest values
tree = cKDTree(np.vstack([x_vector, y_vector]))
kdtree_indices = tree.query(h5_coords)[1]

but it results in this same traceback error.

Traceback Error:

Traceback (most recent call last):
File "meera_extract.py", line 45, in <module>
kdtree_indices = tree.query(h5_coords)[1]
File "scipy/spatial/ckdtree.pyx", line 618, in scipy.spatial.ckdtree.cKDTree.query (scipy/spatial/ckdtree.cxx:6996)
ValueError: x must consist of vectors of length 201 but has shape (1, 389880)

Help me, stackoverflow. You're my only hope.


Solution

  • So it seems I need to read up on the differences of vstack & column_stack and the use of transpose i.e. .T. If anyone has the same issue here is what I changed to make the cKDtree work. Hopefully it will help if someone else runs into this issue. Many thanks to comments from the community to help solve this!

    I changed how the hdf5 coordinates were brought in from vstack to column_stack and removing the transpose .T.

    # Get coordinates of HDF5 file
    h5_coords = np.column_stack([meta['latitude'], meta['longitude']])
    

    Instead of trying to add the points in the tree I made a new variable to hold them:

    # combine x and y
    vector_pnts = np.column_stack([x_vector, y_vector])
    

    Then I ran the kdtree without any error.

    # Run the kdtree to match nearest values
    tree = cKDTree(vector_pnts)
    kdtree_indices = tree.query(h5_coords)[1]