I'd like to vectorize my code and tried
df['results'] = coord.loc[df['a'],'x_coord'] * coord.loc[df['b'],'y_coord']
but it returns the error "ValueError: cannot reindex on an axis with duplicate labels" because df['a] and df['b'] both contain duplicate values. These cannot be removed because they are the whole point (the df contains coordinates, therefore there are pairs like (1,0), (1,1), (0,1) etc.).
This version using apply works well but is too slow (the actual dfs have closer to a million rows and there are thousands of them to be processed):
def calc(a,b):
result = coord.loc[a,'x_coord'] * coord.loc[b,'y_coord']
return result
df['results'] = df.apply(lambda row: calc(row['a'],row['b']),axis=1)
Any tips on how to fix the error or other approaches for vectorizing/speeding this bit up are welcome!
This is because either coord
or df
has duplicated index. You can convert them into numpy arrays:
df['results'] = coord.loc[df['a'],'x_coord'].to_numpy() * coord.loc[df['b'],'y_coord'].to_numpy()