I have the following data frame:
Group Number Letter
G1 1 A
G1 NA B
G1 NA X
G1 NA D
G2 1 A
G2 NA B
G2 NA C
G2 NA X
G3 1 A
G3 2 B
G3 3 C
G3 NA X
I want to transform this dataframe within each group using the following conditions:
This should yield a data frame as follows:
Group Number Letter
G1 1 A
G1 NA U
G1 NA X
G1 NA D
G2 1 A
G2 NA U
G2 NA U
G2 NA X
G3 1 A
G3 2 B
G3 3 C
G3 NA X
How can I do this in R, ideally using dplyr?
data.table solution. Now used lapply
but there is probably a way to vectorize using the by
argument of data.table
library(data.table)
#read data
dt <- structure(list(group = c("G1", "G1", "G1", "G1", "G2", "G2",
"G2", "G2", "G3", "G3", "G3", "G3"), number = c(1, NA, NA, NA,
1, NA, NA, NA, 1, 2, 3, NA), Letter = c("A", "U", "X", "D", "A",
"U", "U", "X", "A", "B", "C", "X")), row.names = c(NA, -12L), class = "data.frame", index = integer(0)) |> as.data.table()
#add rows
dt[, row := 1:nrow(dt)]
#find rows with last observations of x
maxrows <- dt[Letter == "X", max(row), by = group]
#change values
lapply(maxrows$group, FUN = function(x){
dt[group == x & row < maxrows[group == x]$V1 & Letter != "X" & is.na(number), Letter := "U"]
})
#remove row
dt[, row := NULL]