I am trying to create a new set of variables based on observations at 5 different time points. However, there is not an observation for each row at each time point. Assuming it looks something like this:
X1 <- c(NA,NA,7,8,1,5)
X2 <- c(NA,0,0,NA,3,7)
X3 <- c(NA,2,3,4,2,7)
X4 <- c(1,1,5,2,1,7)
X5 <- c(2,NA,NA,4,3,NA)
df <- data.frame(X1,X2,X3,X4,X5)
X1 X2 X3 X4 X5
1 NA NA NA 1 2
2 NA 0 2 1 NA
3 7 0 3 5 NA
4 8 NA 4 2 4
5 1 3 2 1 3
6 5 7 7 7 NA
I want to create 5 new variables, say T1 - T5 so that T1 is propagated with the first non-NA value in that row and then for each value following to remain the same.
X1 X2 X3 X4 X5 T1 T2 T3 T4 T5
1 NA NA NA 1 2 1 2 NA NA NA
2 NA 0 2 1 NA 0 2 1 NA NA
3 7 0 3 5 NA 7 0 3 5 NA
4 8 NA 4 2 4 8 NA 4 2 4
5 1 3 2 1 3 1 3 2 1 3
6 5 7 7 7 NA 5 7 7 7 NA
Any suggestions? Thank you in advance!
fun <- function(z) {
ind <- which.max(!is.na(z))
if (!length(ind)) ind <- 1;
c(z[ind:length(z)], if (ind > 1) z[1:(ind-1)])
}
cbind(df, setNames(as.data.frame(t(apply(df, 1, fun))), sub("^X", "T", names(df))))
# X1 X2 X3 X4 X5 T1 T2 T3 T4 T5
# 1 NA NA NA 1 2 1 2 NA NA NA
# 2 NA 0 2 1 NA 0 2 1 NA NA
# 3 7 0 3 5 NA 7 0 3 5 NA
# 4 8 NA 4 2 4 8 NA 4 2 4
# 5 1 3 2 1 3 1 3 2 1 3
# 6 5 7 7 7 NA 5 7 7 7 NA
Walkthrough:
fun
, the which.max
will return the first non-NA
within the vector (which will be a "row" within the frame); in a corner-case where all values are NA
, it returns integer(0)
, so we need to verify its length before indexing the vector;apply(., 1, fun)
converts df
to a matrix, then applies the function fun
on each row;apply(., 1, ..)
returns a transposed matrix, we t(.)
transpose it;as.data.frame(.)
it, then change the column names with setNames
and sub(.)
;cbind
it with the original data.