I would like to compute a distance distance between all elements of two series:
import pandas as pd
a = pd.Series([1,2,3], ['a', 'b', 'c'] )
b = pd.Series([4, 5, 6, 7], ['k', 'l', 'm', 'n'])
def dist(x, y):
return x - y #(or some arbitrary function)
I did achieve the expected result using numpy and converting to a dataframe to add the index and columns.
import numpy as np
pd.DataFrame(a.values[np.newaxis, :] - b.values[:, np.newaxis],
columns=a.index,
index=b.index)
>>> a b c
k -3 -2 -1
l -4 -3 -2
m -5 -4 -3
n -6 -5 -4
This does not feel as robust as direct operations on the DataFrame, is there a way to achieve this in pandas ?
In my opinion faster and better is use here numpy with broadcasting, but is possible only pandas solution in loop by Series.apply
(slowier):
print (b.apply(lambda x: dist(a, x)))
a b c
k -3 -2 -1
l -4 -3 -2
m -5 -4 -3
n -6 -5 -4
print (b.apply(lambda x: a - x))
a b c
k -3 -2 -1
l -4 -3 -2
m -5 -4 -3
n -6 -5 -4
#your solution (a bit simplier)
df = pd.DataFrame(a.to_numpy() - b.to_numpy()[:, None],
columns=a.index,
index=b.index)
print (df)
a b c
k -3 -2 -1
l -4 -3 -2
m -5 -4 -3
n -6 -5 -4