Comparison between rows based on custom function

I have a large size data frame such as below:

Vehicle  longitude latitude trip 
  0        33        155     0
  0        34        156     1
  1        32        154     2
  1        37        154     5
  2        25        145     2
  .        .          .      .
  .        .          .      .

I also defined a custom boolean function to check if coordination is inside a specific area.

def check(main_vehicle_latitude,main_vehicle_longitude,radius,compare_vehicle_latitdude,compare_vehicle_longitude):
  if condition:
     x=True
  return X

Now I want to apply this function to (each row) of my data frame in a way that I compare each vehicle/trip with all other vehicle coordinates and find all the vehicles that have a similar trip location so the final output would be a list for each vehicle that includes all other vehicles that have a similar trip location. For example, the coordinates of vehicle (0) and trip (0) should be compared with all other vehicles to find a list of all vehicles that have similar start coordinates with the first vehicle (trip 0) and continue to check this for all vehicle trips. It seems a bit complicated to explain but hopefully, it was clear enough. I'm looking for a very efficient way since the data frame is large but unfortunately, I'm a beginner so any help with this would be greatly appreciated.

Solution

With the following toy dataframe:

import pandas as pd

df = pd.DataFrame(
    {
        "vehicle": [0, 0, 1, 1, 2, 3, 4, 5],
        "longitude": [33, 34, 32, 37, 25, 33, 37, 33],
        "latitude": [155, 156, 154, 154, 145, 155, 154, 155],
        "trip": [0, 1, 2, 5, 2, 1, 0, 6],
    }
)

print(df)
# Output
   vehicle  longitude  latitude  trip
0        0         33       155     0
1        0         34       156     1
2        1         32       154     2
3        1         37       154     5
4        2         25       145     2
5        3         33       155     1
6        4         37       154     0
7        5         33       155     6

Here is one way to do it with Pandas groupby, explode, and concat:

# Find matches
tmp = df.groupby(["longitude", "latitude"]).agg(list).reset_index(drop=True)
tmp["match"] = tmp.apply(lambda x: 1 if len(x["vehicle"]) > 1 else pd.NA, axis=1)
tmp = tmp.dropna()

# Format results
tmp["match"] = tmp.apply(
    lambda x: [[v, t] for v, t in zip(x["vehicle"], x["trip"])], axis=1
)
tmp = tmp.explode("vehicle")
tmp = tmp.explode("trip")
tmp["match"] = tmp.apply(
    lambda x: x["match"] if [x["vehicle"], x["trip"]] in x["match"] else pd.NA, axis=1
)
tmp = tmp.dropna()
tmp["match"] = tmp.apply(
    lambda x: [p[0] for p in x["match"] if p[0] != x["vehicle"]], axis=1
)

# Add results back to initial dataframe
df = pd.concat(
    [df.set_index(["vehicle", "trip"]), tmp.set_index(["vehicle", "trip"])], axis=1
)

Then:

print(df)
# Output
              longitude  latitude   match
vehicle trip
0       0            33       155  [3, 5]
        1            34       156     NaN
1       2            32       154     NaN
        5            37       154     [4]
2       2            25       145     NaN
3       1            33       155  [0, 5]
4       0            37       154     [1]
5       6            33       155  [0, 3]