I hope I can explain this properly but essentially, I am trying to organize some data that I have that is censored. Say, for example I have people who are still alive ("."), people who died (1), people who stopped responding to the study (0). I currently have a dataframe that looks like this:
T1 <- c(".",".",".",".",".")
T2 <- c(".",".",".",".",".")
T3 <- c(".",1,".",NA,".")
T4 <- c(NA,NA,".",NA,1)
T5 <- c(NA,NA,".", NA,NA)
df <- data.frame(T1,T2,T3,T4,T5)
T1 T2 T3 T4 T5
1 . . . <NA> <NA>
2 . . 1 <NA> <NA>
3 . . . . .
4 . . <NA> <NA> <NA>
5 . . . 1 <NA>
So basically, for those who were censored, which are essentially anyone who didn't die, I want the first "NA" value to be "0" because right now, I can't distinguish who has been censored.
Essentially, I am hoping to identify a code that will allow me to change the first "NA" value of any row without a "1" in it into a "0". I'm hoping for the output to look something like this:
T1 T2 T3 T4 T5
1 . . . 0 <NA>
2 . . 1 <NA> <NA>
3 . . . . .
4 . . 0 <NA> <NA>
5 . . . 1 <NA>
I may be having issues because I believe "." wouldn't be numeric so if that is the case and it's easier to have a number, I would prefer to use "99" just to keep things straight. Any suggestions would be appreciated, thank you!
By rowwise, an easier option is
df[] <- t(apply(df, 1, function(x) if(!1 %in% x)
replace(x, is.na(x) & !duplicated(x), 0) else x))
-output
> df
T1 T2 T3 T4 T5
1 . . . 0 <NA>
2 . . 1 <NA> <NA>
3 . . . . .
4 . . 0 <NA> <NA>
5 . . . 1 <NA>
Or use a vectorized approach
j1 <- max.col(is.na(df), "first")
i1 <- seq_len(nrow(df))
df[cbind(i1, j1)][!!rowSums(is.na(df)) &
!rowSums(df == 1, na.rm = TRUE)] <- "0"
-output
> df
T1 T2 T3 T4 T5
1 . . . 0 <NA>
2 . . 1 <NA> <NA>
3 . . . . .
4 . . 0 <NA> <NA>
5 . . . 1 <NA>