I have a dataframe df
that contains the information of transactions from a individual Name_Give
to another Name_Receive
like the following:
Name_Give Name_Receive Amount
0 John Tom 300
1 Eva Tom 700
2 Sarah Tom 100
3 John Tom 200
4 Tom Eva 700
5 John Eva 300
6 Carl Eva 250
for each Name_Receive
I would like to compute the Shannon Entropy as S_j = -sum_i p_i \log p_i
where p_i
is the amount divided by the sum of the amount for the user j
S_Tom = - (300/1300 * np.log(300/1300) + 700/1300 * np.log(700/1300) + 100/1300 * np.log(100/1300) + 200/1300 * np.log(200/1300))
S_Eva = - (700/1250 * np.log(700/1250) + 300/1250 * np.log(300/1250) + 250/1250 * np.log(250/1250)
S_Tom = 1.157
S_Eva = 0.99
I would like to have dataframe df1
like the following
Name Entropy
0 Tom 1.157
1 Eva 0.99
Use groupby
and transfrom
to get total sum of each group and then divide the Amount
column values with each group sum and compute the values :
g_sum = df.groupby('Name_Receive')['Amount'].transform('sum')
values = df['Amount']/g_sum
df['Entropy'] = -(values*np.log(values))
df1 = df.groupby('Name_Receive',as_index=False,sort=False)['Entropy'].sum()
Name_Receive Entropy
0 Tom 1.156988
1 Eva 0.989094
If the values contain 0's then use at the end after groupby:
df1['Entropy'] = df1['Entropy'].fillna(0)
Since 0*np.log(0)
gives nan
to make it 0
use fillna