I have a series of vectors, each of them named as a stock, like FB for Facebook Inc. So I have over 70 series of vectors inside a data frames, for example, GEEK, IPAS, JCON etc. Over each pair of stocks, say for example, GEEK and JCON, I have to calculate a measure, called Mutual Information. I have done some code to find that measure over a pair of stocks, and it's like that.
To find entropyz
(the entropy of X, Y, say the bivariate entropy of GEEK and JCON returns)
denz<-kde2d(x,y, n=512, lims=c(xlim,ylim))
z<-denz$z
cell_sizez<-(diff(xlim)/512) * (diff(ylim)/512)
normz<-sum(z)*cell_sizez
integrandz<-z*log(z)
entropyz<-sum(integrandz)*cell_sizez
entropyz<-entropyz/normz
To find entropyx
(the entropy of X, say GEEK returns)
denx<-kde(x=x,gridsize = 512, xmin=xlim[1], xmax = xlim[2])
zx<-denx$estimate
cell_sizex<-(diff(xlim)/512)
normx<-sum(zx)*cell_sizex
integrandx<-zx*log(zx)
entropyx<-sum(integrandx)*cell_sizex
entropyx<-entropyx/normx
To find entropyy
(entropy of Y, say JCON returns)
deny<-kde(x=y,gridsize = 512, xmin=ylim[1], xmax = ylim[2])
zy<-deny$estimate
cell_sizey<-(diff(ylim)/512)
normy<-sum(zy)*cell_sizey
integrandy<-zy*log(zy)
entropyy<-sum(integrandy)*cell_sizey
entropyy<-entropyy/normy
Finally, to find the mutual information of GEEK and JCON
MI <- entropyx+entropyy-entropyz
So, i have found the mutual information for X and Y (the two stocks above). But I have to calculate this measure for over 70 stocks (vectors), with 70 * 69 / 2 iteractions = 2415; It is like to make a correlation matrix, because it is pairwise comparison.
The question is if one knows a way to make R find that mutual information for all pairs (x,y
) in my dataset. So, in other words, to iterate this code for every pair over the dataframe, thus creating a pairwise matrix.
Thanks a lot!
If you create a function MI
that takes in your two vectors of data and returns the value you could use something like the following to generate a symmetric square matrix with the results in. If we assume your data is in a data frame df
we could do
MI = function(x,y,xlim,ylim){
denz<-kde2d(x,y, n=512, lims=c(xlim,ylim))
z<-denz$z
cell_sizez<-(diff(xlim)/512) * (diff(ylim)/512)
normz<-sum(z)*cell_sizez
integrandz<-z*log(z)
entropyz<-sum(integrandz)*cell_sizez
entropyz<-entropyz/normz
denx<-kde(x=x,gridsize = 512, xmin=xlim[1], xmax = xlim[2])
zx<-denx$estimate
cell_sizex<-(diff(xlim)/512)
normx<-sum(zx)*cell_sizex
integrandx<-zx*log(zx)
entropyx<-sum(integrandx)*cell_sizex
entropyx<-entropyx/normx
deny<-kde(x=y,gridsize = 512, xmin=ylim[1], xmax = ylim[2])
zy<-deny$estimate
cell_sizey<-(diff(ylim)/512)
normy<-sum(zy)*cell_sizey
integrandy<-zy*log(zy)
entropyy<-sum(integrandy)*cell_sizey
entropyy<-entropyy/normy
return(entropyx+entropyy-entropyz)
}
df = data.frame(1:10,1:10,1:10,1:10,1:10)
matrix(
apply(
expand.grid(
seq_along(df),seq_along(df)),1,
FUN = function(i,j) MI(df[,i],df[,j],xlim,ylim)
),
nrow = ncol(df)
)
this works because expand.grid
gives you all the combinations of column indicies in a n^2 by 2 data frame. We then apply the MI
function to each of those and store the result in a matrix.
Edit: Edited to make more clear