Search code examples
rdataframerow

Transforming values in R data frame using three conditions within groups


I have the following data frame:

 Group   Number    Letter
    G1        1         A
    G1        NA        B
    G1        NA        X
    G1        NA        D
    G2        1         A
    G2        NA        B
    G2        NA        C
    G2        NA        X
    G3        1         A
    G3        2         B
    G3        3         C
    G3        NA        X

I want to transform this dataframe within each group using the following conditions:

  1. If the letter in the Letter column equals “X”, then all preceding rows with NA in the Number column have their letter changed to “U”.
  2. If the preceeding rows have a number in the Number column instead of NA, they stay the same.
  3. Rows with X in the Letter column do not change.

This should yield a data frame as follows:

 Group   Number    Letter
    G1        1         A
    G1        NA        U
    G1        NA        X
    G1        NA        D
    G2        1         A
    G2        NA        U
    G2        NA        U
    G2        NA        X
    G3        1         A
    G3        2         B
    G3        3         C
    G3        NA        X

How can I do this in R, ideally using dplyr?


Solution

  • data.table solution. Now used lapply but there is probably a way to vectorize using the by argument of data.table

    library(data.table)
    
    #read data
    dt <- structure(list(group = c("G1", "G1", "G1", "G1", "G2", "G2", 
    "G2", "G2", "G3", "G3", "G3", "G3"), number = c(1, NA, NA, NA, 
    1, NA, NA, NA, 1, 2, 3, NA), Letter = c("A", "U", "X", "D", "A", 
    "U", "U", "X", "A", "B", "C", "X")), row.names = c(NA, -12L), class = "data.frame", index = integer(0)) |> as.data.table()
    
    #add rows
    dt[, row := 1:nrow(dt)]
    
    #find rows with last observations of x
    maxrows <- dt[Letter == "X", max(row), by = group]
    
    #change values
    lapply(maxrows$group, FUN = function(x){
      dt[group == x & row < maxrows[group == x]$V1 & Letter != "X" & is.na(number), Letter := "U"]
    })
    
    #remove row
    dt[, row := NULL]