Search code examples
rdata.tablecoalesce

keep last non missing observation for all variables by group


My data has multiple columns and some of those columns have missing values in different rows. I would like to group (collapse) the data by the variable "g", keeping the last non missing obserbation of each varianle.

Input:

d <- data.table(a=c(1,NA,3,4),b=c(1,2,3,4),c=c(NA,NA,'c',NA),g=c(1,1,2,2))

Desired output

d_g <- data.table(a=c(1,4),b=c(2,4),c=c(NA,'c'),g=c(1,2))

data.table (or dplyr) solution prefered here

OBS:this is related to this question, but the main answers there seem to cause unecessary NAs in some groups


Solution

  • Using data.table :

    library(data.table)
    
    d[, lapply(.SD, function(x) last(na.omit(x))), g]
    
    #   g a b    c
    #1: 1 1 2 <NA>
    #2: 2 4 4    c