Search code examples
rapply

R: How to perform calculation by rows using data stored in defined columns?


The example data is shown as below:

id a b n1 n2
1 1 1 10 20
2 2 2 20 40
3 0 0 10 20
4 1 0 20 40
5 0 1 10 20

I need to calculate score k1 and k2 in R.

Assuming C is a constant.

k1=(a/b)/(n1/n2+C)

k2=(a/b)/(n1+n2+C)

Because row3 is double-arm zero data, k1 and k2 will be NA. If k1 or k2 is NA, an alternative formula will be used:

k1=n1/(n1+n2)

k2=n2/(n1+n2)

What I did is using for loop to locate the exact value in every single cell. But it will be very slow when applied to a huge dataset. apply function seems to be a faster method. But I'm too naive to create a runnable function for apply(data, 1, function). I don't know what kind of input should be given into apply. Is there any elegant and faster way to do this job except for the for loop? Thank you so much.

My code is pasted below:

k1 = c()
k2 = c()
C = 0.25

for (i in 1:nrow(data)){
  k1[i] = (data[i,"a"]/data[i,"b"])/(data[i,"n1"]/data[i,"n2"]+C)
  k2[i] = (data[i,"a"]/data[i,"b"])/(data[i,"n1"]+data[i,"n2"]+C)
  
  if (is.na(k1[i])){
    k1[i] = data[i,"n1"]/(data[i,"n1"]+data[i,"n2"])
  }
  
  if (is.na(k2[i])){
    k2[i] = data[i,"n2"]/(data[i,"n1"]+data[i,"n2"])
  }
}

Solution

  • You can use the mutate() function from {dplyr}:

    # Calculate k1 and k2
    data <- data %>% 
        # Perform calculation
        mutate(k1 = (a/b)/(n1/n2+C),                     # k1
               k2 = (a/b)/(n1+n2+C),                     # k2
               k1 = ifelse(is.na(k1), n1/(n1+n2), k1),   # Other formula for k1 if k1 is NA
               k2 = ifelse(is.na(k2), n2/(n1+n2), k2))   # Other formula for k2 if k2 is NA
    

    This gives me the same as your code returned, but is more efficient:

    # A tibble: 5 × 6
          a     b    n1    n2      k1       k2
      <dbl> <dbl> <dbl> <dbl>   <dbl>    <dbl>
    1     1     1    10    20   1.33    0.0331
    2     2     2    20    40   1.33    0.0166
    3     0     0    10    20   0.333   0.667 
    4     1     0    20    40 Inf     Inf     
    5     0     1    10    20   0       0