Search code examples
rdata.tableunique

Use unique in r with more than one logical condition


Following data frame in data.table

df <- data.table (id=c(1,1,2,2,3,3,4,4),
                  date=c("2013-11-22","2017-01-24","2017-06-24","2020-02-10","2011-01-03","2013-11-24","2015-01-24","2017-08-24"),
                  status=c("Former","Current","Former","Never","Current",NA,"Current","Former"))
df
id       date  status
1:  1 2013-11-22  Former
2:  1 2017-01-24 Current
3:  2 2017-06-24  Former
4:  2 2020-02-10   Never
5:  3 2011-01-03 Current
6:  3 2013-11-24    <NA>
7:  4 2015-01-24 Current
8:  4 2017-08-24  Former

I want to create a unique row per id with the following logicals. The latest date should be kept. If the status at latest date is <NA> or Never and there was another status for an earlier date, than the row with the earlier date should be kept. I solved this with the following functions:

unique1 <- df[df$status %in% c("Former","Current"),]
unique1 <- unique1[,.SD[which.max(anydate(date))],by=.(id)]
unique_final <- unique(df[order(id,ordered(status,c("Former","Current","Never",NA)))],by='id')
unique_final[match(unique1$id,unique_final$id),]<-unique1

and get these results

id       date  status
1:  1 2017-01-24 Current
2:  2 2017-06-24  Former
3:  3 2011-01-03 Current
4:  4 2017-08-24  Former

Is there a way to combine these two logical subsetting steps? I would like to avoid creating a new data frame and than matching them. I am working with data.table and a solution for a larger data set would be great. Thanks!


Solution

  • Could try:

    library(data.table)
    
    df[, .SD[
      if (all(status %in% c(NA, 'Never'))) .N
      else max(which(!status %in% c(NA, 'Never')))
      ], by = id]
    

    Output:

       id       date  status
    1:  1 2017-01-24 Current
    2:  2 2017-06-24  Former
    3:  3 2011-01-03 Current
    4:  4 2017-08-24  Former