I have a very large data frame that I need to subset by last values. I know that the data.table
library includes the last()
function which returns the last value of an array, but what I need is to subset foo
by the last value in id
for every separate value in track
. Values in id
are consecutive integers, but the last values will be different for every track
.
> head(foo)
track id coords.x coords.y
1 0 0 -79.90732 43.26133
2 0 1 -79.90733 43.26124
3 0 2 -79.90733 43.26124
4 0 3 -79.90733 43.26124
5 0 4 -79.90725 43.26121
6 0 5 -79.90725 43.26121
The output would look something like this.
track id coords.x coords.y
1 0 57 -79.90756 43.26123
2 1 98 -79.90777 43.26231
3 2 61 -79.90716 43.26200
... and so on
How would one apply the last()
function (or another function like tail()
) to produce this output?
We can use data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'track' get the last row with tail
library(data.table)
setDT(df1)[, tail(.SD, 1), by = track]
As the also mentioned another logic with 'id' about the consecutive numbers, we can also create a logical index using diff
, get the row index (.I
) and subset the rows.
setDT(df1)[df1[, .I[c(FALSE, diff(id) ! = 1)], by = track]$V1]
Or we can do this using base R
itself
df1[!duplicated(df1$track, fromLast=TRUE),]
Or another option is dplyr
library(dplyr)
df1 %>%
group_by(track) %>%
slice(n())