Search code examples
rmatrixprobabilityfrequency

Divide each each cell of large matrix by sum of its row


I have a site by species matrix. The dimensions are 375 x 360. Each value represents the frequency of a species in samples of that site.

I am trying to convert this matrix from frequencies to relative abundances at each site.

I've tried a few ways to achieve this and the only one that has worked is using a for loop. However, this takes an incredibly long time or simply never finishes.

Is there a function or a vectorised method of achieving this? I've included my for-loop as an example of what I am trying to do.

relative_abundance <- matrix(0, nrow= nrow(data_wide),
ncol=ncol(data), dimnames = dimnames(data))

i=0
j=0

for(i in 1:nrow(relative_abundance)){
  for(j in 1:ncol(relative_abundance)){
    species_freq <- data[i,j]
    row_sum <- sum(data[i,])
    relative_abundance[i,j] <- species_freq/row_sum
 }
}

Solution

  • You could do this using apply, but scale in this case makes things even simplier. Assuming you want to divide columns by their sums:

    set.seed(0)
    relative_abundance <- matrix(sample(1:10, 360*375, TRUE), nrow= 375)
    
    freqs <- scale(relative_abundance, center = FALSE, 
                   scale = colSums(relative_abundance))
    

    The matrix is too big to output here, but here's how it shoud look like:

    > head(freqs[, 1:5])
                [,1]         [,2]        [,3]        [,4]         [,5]
    [1,] 0.004409603 0.0014231499 0.003439803 0.004052685 0.0024026910
    [2,] 0.001469868 0.0023719165 0.002457002 0.005065856 0.0004805382
    [3,] 0.001959824 0.0018975332 0.004914005 0.001519757 0.0043248438
    [4,] 0.002939735 0.0042694497 0.002948403 0.002532928 0.0009610764
    [5,] 0.004899559 0.0009487666 0.000982801 0.001519757 0.0028832292
    [6,] 0.001469868 0.0023719165 0.002457002 0.002026342 0.0009610764
    

    And a sanity check:

    > head(colSums(freqs))
    [1] 1 1 1 1 1 1
    

    Using apply:

    freqs2 <- apply(relative_abundance, 2, function(i) i/sum(i))
    

    This has the advatange of being easly changed to run by rows, but the results will be joined as columns anyway, so you'd have to transpose it.