Search code examples
rdataframerecode

How to shift values left in R so that first non-NA value propagates column 1


I am trying to create a new set of variables based on observations at 5 different time points. However, there is not an observation for each row at each time point. Assuming it looks something like this:

X1 <- c(NA,NA,7,8,1,5)
X2 <- c(NA,0,0,NA,3,7)
X3 <- c(NA,2,3,4,2,7)
X4 <- c(1,1,5,2,1,7)
X5 <- c(2,NA,NA,4,3,NA)
df <- data.frame(X1,X2,X3,X4,X5)

  X1 X2 X3 X4 X5
1 NA NA NA  1  2
2 NA  0  2  1 NA
3  7  0  3  5 NA
4  8 NA  4  2  4
5  1  3  2  1  3
6  5  7  7  7 NA

I want to create 5 new variables, say T1 - T5 so that T1 is propagated with the first non-NA value in that row and then for each value following to remain the same.

  X1 X2 X3 X4 X5 T1 T2 T3 T4 T5
1 NA NA NA  1  2  1  2 NA NA NA
2 NA  0  2  1 NA  0  2  1 NA NA
3  7  0  3  5 NA  7  0  3  5 NA
4  8 NA  4  2  4  8 NA  4  2  4
5  1  3  2  1  3  1  3  2  1  3
6  5  7  7  7 NA  5  7  7  7 NA

Any suggestions? Thank you in advance!


Solution

  • fun <- function(z) {
      ind <- which.max(!is.na(z))
      if (!length(ind)) ind <- 1; 
      c(z[ind:length(z)], if (ind > 1) z[1:(ind-1)])
    }
    cbind(df, setNames(as.data.frame(t(apply(df, 1, fun))), sub("^X", "T", names(df))))
    #   X1 X2 X3 X4 X5 T1 T2 T3 T4 T5
    # 1 NA NA NA  1  2  1  2 NA NA NA
    # 2 NA  0  2  1 NA  0  2  1 NA NA
    # 3  7  0  3  5 NA  7  0  3  5 NA
    # 4  8 NA  4  2  4  8 NA  4  2  4
    # 5  1  3  2  1  3  1  3  2  1  3
    # 6  5  7  7  7 NA  5  7  7  7 NA
    

    Walkthrough:

    • within fun, the which.max will return the first non-NA within the vector (which will be a "row" within the frame); in a corner-case where all values are NA, it returns integer(0), so we need to verify its length before indexing the vector;
    • apply(., 1, fun) converts df to a matrix, then applies the function fun on each row;
    • since apply(., 1, ..) returns a transposed matrix, we t(.) transpose it;
    • since that returns a matrix, we as.data.frame(.) it, then change the column names with setNames and sub(.);
    • finally, cbind it with the original data.