Is there a fast way to get a subsection of an array based on a research criteria?

I have got the two following numpy arrays. The first one is a single column which contains the ID of tens of millions of points. The second one represents the points and their coordinates x,y. It has 3 columns that are : id, x, y. It also contains tens of millions of points.

import numpy as np

points = np.array([[id], [id], ..., [id]])  # array of points
coordinates = np.array([[id, x, y], ..., [id, x, y]]) # array of points and their coordinates

I would like to get all the rows of the array coordinates which ID figures in the array points. In SQL, it would be written SELECT * FROM coordinates WHERE id IN points (and keep the result ordered the same way as in the array points)

I've got the following constraints :

the IDs are not sorted the same way
I'm searching for a solution without any loops on indexes (to get a faster algorithm)

I've tried the following :


indexes = []
for i in range(len(points)):
    indexes.append(np.where(coordinates[:,0] == points[i])[0][0]
result = coordinates[indexes, :]

Which is a solution with loops and that's what I want to avoid.

Solution

Could you provide some context about how you are loading all this data in Python? I believe that if you store the data in a better format from the get go, you could make such a query much faster.

For example, you can store coordinates in a pandas.Series object where the ids in coordinates can be set as the index column for the corresponding [x, y] values:

import pandas as pd
points = # You can load these as you already have
ids_xy = # Load ids for the coordinates separately
coordinates_xy = # Load the x, y coordinates separately
coordinates = pd.Series(coordinates_xy, index=ids_xy)

For a one time conversion from your current format to this format (applicable if you are running some sort of ipynb and have already loaded the data), you can do:

coordinate_series = pd.Series([(x, y) for _, x, y in coordinates], index=coordinates[:, 0])

With this, you can index all the coordinates you want to query by coordinates.loc[points]