Search code examples
pythonpython-3.xpandasdataframeentropy

Create relative entropy matrix from pandas dataframe


I have a dataframe of values, say:

df = pd.DataFrame(np.array([[0.2, 0.5, 0.3], [0.1, 0.2, 0.5], [0.4, 0.3, 0.3]]),
                   columns=['a', 'b', 'c'])

in which every row is a vector of probabilities. I want to compute something like the correlation matrix of df.corr() , but instead of correlation, I want to compute the relative entropy.

What is the best way to do this, as I can't find a way to get inside the .corr() method and simply change the function it uses?


Solution

  • IIUC, use .corr as follows:

    import pandas as pd
    import numpy as np
    
    from scipy.stats import entropy
    
    df = pd.DataFrame(np.array([[0.2, 0.5, 0.3], [0.1, 0.2, 0.5], [0.4, 0.3, 0.3]]),
                       columns=['a', 'b', 'c'])
    
    res = df.corr(method=entropy)
    print(res)
    

    Output

              a         b         c
    a  1.000000  0.160246  0.270608
    b  0.160246  1.000000  0.167465
    c  0.270608  0.167465  1.000000
    

    From the documentation:

    callable: callable with input two 1d ndarrays and returning a float. Note that the returned matrix from corr will have 1 along the diagonals and will be symmetric regardless of the callable’s behavior.