Search code examples
arraysdataframeuniquecombinations

Convert a multidimensional array to a data frame Python


I have a data set as below:

data={ 'StoreID':['a','b','c','d'],
       'Sales':[1000,200,500,800],
       'Profit':[600,100,300,500]
}
data=pd.DataFrame(data)
data.set_index(['StoreID'],inplace=True,drop=True)
X=data.values
from sklearn.metrics.pairwise import euclidean_distances
dist=euclidean_distances(X)

Now I get an array as below:

array([[0. ,943,583,223],
       [943, 0.,360,721],
       [583,360,0., 360],
       [223,721,360, 0.]])

My purpose to get unique combinations of stores and their corresponding distance. I would like the end results as a data frame below:

Store   NextStore   Dist
a       b           943
a       c           583
a       d           223
b       c           360
b       d           721
c       d           360

Thank you for your help!


Solution

  • You probably want pandas.melt which will "unpivot" the distance matrix into tall-and-skinny format.

    m = pd.DataFrame(dist)
    m.columns = list('abcd')
    m['Store'] = list('abcd')
    

    ...which produces:

                a           b           c           d Store
    0    0.000000  943.398113  583.095189  223.606798     a
    1  943.398113    0.000000  360.555128  721.110255     b
    2  583.095189  360.555128    0.000000  360.555128     c
    3  223.606798  721.110255  360.555128    0.000000     d
    

    Melt data into tall-and-skinny format:

    pd.melt(m, id_vars=['Store'], var_name='nextStore')
    
       Store nextStore       value
    0      a         a    0.000000
    1      b         a  943.398113
    2      c         a  583.095189
    3      d         a  223.606798
    4      a         b  943.398113
    5      b         b    0.000000
    6      c         b  360.555128
    7      d         b  721.110255
    8      a         c  583.095189
    9      b         c  360.555128
    10     c         c    0.000000
    11     d         c  360.555128
    12     a         d  223.606798
    13     b         d  721.110255
    14     c         d  360.555128
    15     d         d    0.000000
    

    Remove redundant rows, convert dist to int, and sort:

    df2 = pd.melt(m, id_vars=['Store'], 
                     var_name='NextStore',
                     value_name='Dist')
    df3 =  df2[df2.Store < df2.NextStore].copy()
    df3.Dist = df3.Dist.astype('int')
    df3.sort_values(by=['Store', 'NextStore'])
    
       Store NextStore  Dist
    4      a         b   943
    8      a         c   583
    12     a         d   223
    9      b         c   360
    13     b         d   721
    14     c         d   360