I have a data set as below:
data={ 'StoreID':['a','b','c','d'],
'Sales':[1000,200,500,800],
'Profit':[600,100,300,500]
}
data=pd.DataFrame(data)
data.set_index(['StoreID'],inplace=True,drop=True)
X=data.values
from sklearn.metrics.pairwise import euclidean_distances
dist=euclidean_distances(X)
Now I get an array as below:
array([[0. ,943,583,223],
[943, 0.,360,721],
[583,360,0., 360],
[223,721,360, 0.]])
My purpose to get unique combinations of stores and their corresponding distance. I would like the end results as a data frame below:
Store NextStore Dist
a b 943
a c 583
a d 223
b c 360
b d 721
c d 360
Thank you for your help!
You probably want pandas.melt which will "unpivot" the distance matrix into tall-and-skinny format.
m = pd.DataFrame(dist)
m.columns = list('abcd')
m['Store'] = list('abcd')
...which produces:
a b c d Store
0 0.000000 943.398113 583.095189 223.606798 a
1 943.398113 0.000000 360.555128 721.110255 b
2 583.095189 360.555128 0.000000 360.555128 c
3 223.606798 721.110255 360.555128 0.000000 d
Melt data into tall-and-skinny format:
pd.melt(m, id_vars=['Store'], var_name='nextStore')
Store nextStore value
0 a a 0.000000
1 b a 943.398113
2 c a 583.095189
3 d a 223.606798
4 a b 943.398113
5 b b 0.000000
6 c b 360.555128
7 d b 721.110255
8 a c 583.095189
9 b c 360.555128
10 c c 0.000000
11 d c 360.555128
12 a d 223.606798
13 b d 721.110255
14 c d 360.555128
15 d d 0.000000
Remove redundant rows, convert dist to int, and sort:
df2 = pd.melt(m, id_vars=['Store'],
var_name='NextStore',
value_name='Dist')
df3 = df2[df2.Store < df2.NextStore].copy()
df3.Dist = df3.Dist.astype('int')
df3.sort_values(by=['Store', 'NextStore'])
Store NextStore Dist
4 a b 943
8 a c 583
12 a d 223
9 b c 360
13 b d 721
14 c d 360