Search code examples
rsubsettail

Find last values by condition


I have a very large data frame that I need to subset by last values. I know that the data.table library includes the last() function which returns the last value of an array, but what I need is to subset foo by the last value in id for every separate value in track. Values in id are consecutive integers, but the last values will be different for every track.

> head(foo)
  track id  coords.x coords.y
1     0  0 -79.90732 43.26133
2     0  1 -79.90733 43.26124
3     0  2 -79.90733 43.26124
4     0  3 -79.90733 43.26124
5     0  4 -79.90725 43.26121
6     0  5 -79.90725 43.26121

The output would look something like this.

   track  id  coords.x coords.y
1     0   57  -79.90756 43.26123
2     1   98  -79.90777 43.26231
3     2   61  -79.90716 43.26200

... and so on

How would one apply the last() function (or another function like tail()) to produce this output?


Solution

  • We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'track' get the last row with tail

    library(data.table)
    setDT(df1)[, tail(.SD, 1), by = track]
    

    As the also mentioned another logic with 'id' about the consecutive numbers, we can also create a logical index using diff, get the row index (.I) and subset the rows.

    setDT(df1)[df1[, .I[c(FALSE, diff(id) ! = 1)], by = track]$V1]
    

    Or we can do this using base R itself

    df1[!duplicated(df1$track, fromLast=TRUE),]
    

    Or another option is dplyr

    library(dplyr)
    df1 %>%
        group_by(track) %>%
        slice(n())