I have a large size data frame such as below:
Vehicle longitude latitude trip
0 33 155 0
0 34 156 1
1 32 154 2
1 37 154 5
2 25 145 2
. . . .
. . . .
I also defined a custom boolean function to check if coordination is inside a specific area.
def check(main_vehicle_latitude,main_vehicle_longitude,radius,compare_vehicle_latitdude,compare_vehicle_longitude):
if condition:
x=True
return X
Now I want to apply this function to (each row) of my data frame in a way that I compare each vehicle/trip with all other vehicle coordinates and find all the vehicles that have a similar trip location so the final output would be a list for each vehicle that includes all other vehicles that have a similar trip location. For example, the coordinates of vehicle (0) and trip (0) should be compared with all other vehicles to find a list of all vehicles that have similar start coordinates with the first vehicle (trip 0) and continue to check this for all vehicle trips. It seems a bit complicated to explain but hopefully, it was clear enough. I'm looking for a very efficient way since the data frame is large but unfortunately, I'm a beginner so any help with this would be greatly appreciated.
With the following toy dataframe:
import pandas as pd
df = pd.DataFrame(
{
"vehicle": [0, 0, 1, 1, 2, 3, 4, 5],
"longitude": [33, 34, 32, 37, 25, 33, 37, 33],
"latitude": [155, 156, 154, 154, 145, 155, 154, 155],
"trip": [0, 1, 2, 5, 2, 1, 0, 6],
}
)
print(df)
# Output
vehicle longitude latitude trip
0 0 33 155 0
1 0 34 156 1
2 1 32 154 2
3 1 37 154 5
4 2 25 145 2
5 3 33 155 1
6 4 37 154 0
7 5 33 155 6
Here is one way to do it with Pandas groupby, explode, and concat:
# Find matches
tmp = df.groupby(["longitude", "latitude"]).agg(list).reset_index(drop=True)
tmp["match"] = tmp.apply(lambda x: 1 if len(x["vehicle"]) > 1 else pd.NA, axis=1)
tmp = tmp.dropna()
# Format results
tmp["match"] = tmp.apply(
lambda x: [[v, t] for v, t in zip(x["vehicle"], x["trip"])], axis=1
)
tmp = tmp.explode("vehicle")
tmp = tmp.explode("trip")
tmp["match"] = tmp.apply(
lambda x: x["match"] if [x["vehicle"], x["trip"]] in x["match"] else pd.NA, axis=1
)
tmp = tmp.dropna()
tmp["match"] = tmp.apply(
lambda x: [p[0] for p in x["match"] if p[0] != x["vehicle"]], axis=1
)
# Add results back to initial dataframe
df = pd.concat(
[df.set_index(["vehicle", "trip"]), tmp.set_index(["vehicle", "trip"])], axis=1
)
Then:
print(df)
# Output
longitude latitude match
vehicle trip
0 0 33 155 [3, 5]
1 34 156 NaN
1 2 32 154 NaN
5 37 154 [4]
2 2 25 145 NaN
3 1 33 155 [0, 5]
4 0 37 154 [1]
5 6 33 155 [0, 3]