I have recently started using R. When working on carriage of problematic bacteria, I encountered one problem that I hope somebody could help solve. Apologies if the question is on the easy side.
I want to calculate the cumulative proportion of people who get colonized by the problem bug at various time points (a, b, c) as shown in the dataset below "df". "0" means negative test, "1" means positive test for resistant bug, "NA" means test was not done at the time point. The result should be as described in "x", i.e. if the person ever tests positive on either time point (a,b,c) he should have the value "1" in x. If all his tests were negative he should have value "0", and if he never had a test done, the value should be "NA". Is there a good way to calculate this "x" automatically?
a <- c(0, 0, 1, 0, 0, 1, 0, 0, NA, NA)
b <- c(0, 0, 1, 0, 1, NA, 0, 0, NA, 0)
c <- c(NA, 1, 0, 0, 0, 1, 1, 0, NA, 0)
df <- cbind(a, b, c)
df
x <- c(0, 1, 1, 0, 1, 1, 1, 0,NA,0)
df <- cbind(df, x)
df
I tried to create the x-variable using ifelse, but get problems with missing values. For instance, using the following expression:
y <- ifelse(a==1 | b==1 | c==1, 1, ifelse(a==0 | b==0 | c==0, 0, NA))
df <- cbind(df, y)
df
... the resultant column erroneously get "NA" in row 1 and 10, i.e. when there is a combination of 0 and NA, the result should be 0, not NA.
You can use rowSums
:
cols <- c('a', 'b', 'c')
+(rowSums(df[, cols], na.rm = TRUE) > 0) * NA^+(rowSums(!is.na(df[, cols])) == 0)
#[1] 0 1 1 0 1 1 1 0 NA 0
This gives similar result as x
shown however, might be difficult to understand.
Here is a simple alternative using apply
:
apply(df[, cols], 1, function(x) if(all(is.na(x))) NA else +(any(x == 1, na.rm = TRUE)))
#[1] 0 1 1 0 1 1 1 0 NA 0
This returns NA
if all the values in the row are NA
else checks if any
value has 1 in it.