Search code examples
rfunctionloopsmatrixfinancial

Automate several calculations in R through data frames


I have a series of vectors, each of them named as a stock, like FB for Facebook Inc. So I have over 70 series of vectors inside a data frames, for example, GEEK, IPAS, JCON etc. Over each pair of stocks, say for example, GEEK and JCON, I have to calculate a measure, called Mutual Information. I have done some code to find that measure over a pair of stocks, and it's like that.

To find entropyz (the entropy of X, Y, say the bivariate entropy of GEEK and JCON returns)

denz<-kde2d(x,y, n=512, lims=c(xlim,ylim))
z<-denz$z
cell_sizez<-(diff(xlim)/512) * (diff(ylim)/512)
normz<-sum(z)*cell_sizez
integrandz<-z*log(z)
entropyz<-sum(integrandz)*cell_sizez
entropyz<-entropyz/normz

To find entropyx (the entropy of X, say GEEK returns)

denx<-kde(x=x,gridsize = 512, xmin=xlim[1], xmax = xlim[2])
zx<-denx$estimate
cell_sizex<-(diff(xlim)/512) 
normx<-sum(zx)*cell_sizex
integrandx<-zx*log(zx)
entropyx<-sum(integrandx)*cell_sizex
entropyx<-entropyx/normx

To find entropyy (entropy of Y, say JCON returns)

deny<-kde(x=y,gridsize = 512, xmin=ylim[1], xmax = ylim[2])
zy<-deny$estimate
cell_sizey<-(diff(ylim)/512) 
normy<-sum(zy)*cell_sizey
integrandy<-zy*log(zy)
entropyy<-sum(integrandy)*cell_sizey
entropyy<-entropyy/normy

Finally, to find the mutual information of GEEK and JCON

MI <- entropyx+entropyy-entropyz

So, i have found the mutual information for X and Y (the two stocks above). But I have to calculate this measure for over 70 stocks (vectors), with 70 * 69 / 2 iteractions = 2415; It is like to make a correlation matrix, because it is pairwise comparison. The question is if one knows a way to make R find that mutual information for all pairs (x,y) in my dataset. So, in other words, to iterate this code for every pair over the dataframe, thus creating a pairwise matrix.

Thanks a lot!


Solution

  • If you create a function MI that takes in your two vectors of data and returns the value you could use something like the following to generate a symmetric square matrix with the results in. If we assume your data is in a data frame df we could do

    MI = function(x,y,xlim,ylim){
      denz<-kde2d(x,y, n=512, lims=c(xlim,ylim))
      z<-denz$z
      cell_sizez<-(diff(xlim)/512) * (diff(ylim)/512)
      normz<-sum(z)*cell_sizez
      integrandz<-z*log(z)
      entropyz<-sum(integrandz)*cell_sizez
      entropyz<-entropyz/normz
    
      denx<-kde(x=x,gridsize = 512, xmin=xlim[1], xmax = xlim[2])
      zx<-denx$estimate
      cell_sizex<-(diff(xlim)/512) 
      normx<-sum(zx)*cell_sizex
      integrandx<-zx*log(zx)
      entropyx<-sum(integrandx)*cell_sizex
      entropyx<-entropyx/normx
    
      deny<-kde(x=y,gridsize = 512, xmin=ylim[1], xmax = ylim[2])
      zy<-deny$estimate
      cell_sizey<-(diff(ylim)/512) 
      normy<-sum(zy)*cell_sizey
      integrandy<-zy*log(zy)
      entropyy<-sum(integrandy)*cell_sizey
      entropyy<-entropyy/normy
    
      return(entropyx+entropyy-entropyz)
    }
    df = data.frame(1:10,1:10,1:10,1:10,1:10)
    matrix(
      apply(
        expand.grid(
          seq_along(df),seq_along(df)),1,
        FUN = function(i,j) MI(df[,i],df[,j],xlim,ylim)
        ),
      nrow = ncol(df)
    )
    

    this works because expand.grid gives you all the combinations of column indicies in a n^2 by 2 data frame. We then apply the MI function to each of those and store the result in a matrix.

    Edit: Edited to make more clear