Search code examples
rdataframedplyrdata-manipulation

Dataframe manipulation in R - selecting certain values


I have the following issue that I can't find an answer to:

I have a df that looks like this:

  id x1_t1 x1_t2 x1_t3 y1_t1 y1_t2 y1_t3
1  1     0     0     0   100   110   150
2  2     0     0     1   150   140   180
3  3     0     1     1    90   150   160

x1 and y1 are in their multiple times (different time points). Depending on when the x1 variable changes from 0 to 1, I want the respective y1 values, so that the resulting dataframe would look like this:

  id y1_before y1_after
1  1       110      150
2  2       140      180
3  3        90      150

I have already messed around with dplyr quite a bit, but haven't found a working solution. Perhaps you could give me an idea. thanks in advance!!


Solution

  • 1) dplyr Using dat shown reproducibly in the Note at the end, define the x1_* columns as x1 and then y1_* columns as y1 and the index of the match to the first 1 (or the last y1 if none) as i. Then use i to extract the desired values from x1 and y1.

    library(dplyr)
    
    dat %>%
      mutate(
        x1 = pick(starts_with("x1")), 
        y1 = pick(starts_with("y1")),
        i = match(1, x1, length(x1)), .by = id) %>%
      summarize(y1_before = y1[[i-1]], y1_after = y1[[i]], .by = id)
    

    giving

      id y1_before y1_after
    1  1       110      150
    2  2       140      180
    3  3        90      150
    

    2) Base R Pretty much the same logic can be used in base R:

    do.call("rbind", by(dat, dat$id, \(z) with(z, {
      x1 <- z[grep("x1", names(z))]
      y1 <- z[grep("y1", names(z))]
      i <- match(1, x1, length(x1))
      data.frame(id, y1_before = y1[[i-1]], y1_after = y1[[i]])
    })))
    

    Note

    dat <- data.frame(
      id = 1:3,
      x1_t1 = integer(3),
      x1_t2 = c(0L, 0L, 1L),
      x1_t3 = c(0L, 1L, 1L),
      y1_t1 = c(100L, 150L, 90L),
      y1_t2 = c(110L, 140L, 150L),
      y1_t3 = c(150L, 180L, 160L)
    )