The example data is shown as below:
id | a | b | n1 | n2 |
---|---|---|---|---|
1 | 1 | 1 | 10 | 20 |
2 | 2 | 2 | 20 | 40 |
3 | 0 | 0 | 10 | 20 |
4 | 1 | 0 | 20 | 40 |
5 | 0 | 1 | 10 | 20 |
I need to calculate score k1
and k2
in R.
Assuming C is a constant.
k1=(a/b)/(n1/n2+C)
k2=(a/b)/(n1+n2+C)
Because row3 is double-arm zero data, k1 and k2 will be NA
. If k1 or k2 is NA, an alternative formula will be used:
k1=n1/(n1+n2)
k2=n2/(n1+n2)
What I did is using for loop to locate the exact value in every single cell. But it will be very slow when applied to a huge dataset. apply
function seems to be a faster method. But I'm too naive to create a runnable function for apply(data, 1, function)
. I don't know what kind of input should be given into apply
. Is there any elegant and faster way to do this job except for the for loop? Thank you so much.
My code is pasted below:
k1 = c()
k2 = c()
C = 0.25
for (i in 1:nrow(data)){
k1[i] = (data[i,"a"]/data[i,"b"])/(data[i,"n1"]/data[i,"n2"]+C)
k2[i] = (data[i,"a"]/data[i,"b"])/(data[i,"n1"]+data[i,"n2"]+C)
if (is.na(k1[i])){
k1[i] = data[i,"n1"]/(data[i,"n1"]+data[i,"n2"])
}
if (is.na(k2[i])){
k2[i] = data[i,"n2"]/(data[i,"n1"]+data[i,"n2"])
}
}
You can use the mutate()
function from {dplyr}
:
# Calculate k1 and k2
data <- data %>%
# Perform calculation
mutate(k1 = (a/b)/(n1/n2+C), # k1
k2 = (a/b)/(n1+n2+C), # k2
k1 = ifelse(is.na(k1), n1/(n1+n2), k1), # Other formula for k1 if k1 is NA
k2 = ifelse(is.na(k2), n2/(n1+n2), k2)) # Other formula for k2 if k2 is NA
This gives me the same as your code returned, but is more efficient:
# A tibble: 5 × 6
a b n1 n2 k1 k2
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 10 20 1.33 0.0331
2 2 2 20 40 1.33 0.0166
3 0 0 10 20 0.333 0.667
4 1 0 20 40 Inf Inf
5 0 1 10 20 0 0