Search code examples
rdataframemaxrowsdata-cleaning

Returning values from a column based on the last value of another column


I have a dataset like this:

data <- data.frame(Time = c(1,4,6,9,11,13,16, 25, 32, 65),
                  A = c(10, NA, 13, 2, 32, 19, 32, 34, 93, 12),
                  B = c(1, 99, 32, 31, 12, 13, NA, 13, NA, NA),
                  C = c(2, 32, NA, NA, NA, NA, NA, NA, NA, NA))

What I want to retrieve are the values in Time that corresponds to the last numerical value in A, B, and C. For example, the last numerical values for A, B, and C are 12, 13, and 32 respectively.

So, the Time values that correspond are 65, 25, and 4.

I've tried something like data[which(data$Time== max(data$A)), ], but this doesn't work.


Solution

  • We can multiply the row index with the logical matrix, and get the colMaxs (from matrixStats) to subset the 'Time' column

    library(matrixStats)
    data$Time[colMaxs((!is.na(data[-1])) * row(data[-1]))]
    #[1] 65 25  4
    

    Or using base R, we get the index with which/arr.ind, get the max index using a group by operation (tapply) and use that to extract the 'Time' value

    m1 <- which(!is.na(data[-1]), arr.ind = TRUE)
    data$Time[tapply(m1[,1], m1[,2], FUN = max)]
    #[1] 65 25  4
    

    Or with summarise/across in the devel version of dplyr

    library(dplyr)
    data %>% 
        summarise(across(A:C, ~ tail(Time[!is.na(.)], 1)))
    #    A  B C
    #1 65 25 4
    

    Or using summarise_at with the current version of dplyr

    data %>%
         summarise_at(vars(A:C), ~ tail(Time[!is.na(.)], 1))