Removing the last 2 rows of each factor level in r

I have a dataframe, where a factor has different amounts of rows. I would like to remove the last 2 rows of each factor level. test<-data.frame(id=c(1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,3), val=c(3,5,4,6,7,4,1,6,7,8,4,2,0,3,6,8,1,2), trt=c(1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2))

the outcome should look like this:

   id val trt
1   1   3   1
2   1   5   1
3   1   4   1
4   2   4   1
5   2   1   1
6   2   6   1
7   2   7   1
8   3   2   2
9   3   0   2
10  3   3   2
11  3   6   2
12  3   8   2

An added thing I need is to keep various other columns in the dataframe as well. I edited the example df to include another factor, which is not relevant for this exercise, but I do need it (and a bunch more) for future calculations.

Solution

UPDATE 1/23/2024

A more performant data.table option that does not rearrange the original data:

library(data.table)

setDT(test)[-test[,.I[(.N - 1L):.N], id][[2]]][]
#>     id val trt
#>  1:  1   3   1
#>  2:  1   5   1
#>  3:  1   4   1
#>  4:  2   4   1
#>  5:  2   1   1
#>  6:  2   6   1
#>  7:  2   7   1
#>  8:  3   2   2
#>  9:  3   0   2
#> 10:  3   3   2
#> 11:  3   6   2
#> 12:  3   8   2

Benchmarking against my original answer using a larger dataset:

microbenchmark::microbenchmark(
  SD = test[,.SD[1:(.N - 2)], id],
  I = test[-test[,.I[(.N - 1L):.N], id][[2]]],
  unit = "relative"
)
#> Unit: relative
#>  expr     min       lq     mean  median       uq      max neval
#>    SD 66.9625 65.77887 56.49757 62.2506 57.49157 17.64804   100
#>     I  1.0000  1.00000  1.00000  1.0000  1.00000  1.00000   100

Check that the results are equivalent:

identical(
  setorder(test[,.SD[1:(.N - 2)], id], id),
  setorder(test[-test[,.I[(.N - 1L):.N], id][[2]]], id)
)
#> [1] TRUE

Original Answer

With data.table:

library(data.table)

setDT(test)[,.(val = val[1:(.N - 2)]), id][]
#>     id val
#>  1:  1   3
#>  2:  1   5
#>  3:  1   4
#>  4:  2   4
#>  5:  2   1
#>  6:  2   6
#>  7:  2   7
#>  8:  3   2
#>  9:  3   0
#> 10:  3   3
#> 11:  3   6
#> 12:  3   8

Or, using .SD if you have additional columns:

setDT(test)[,.SD[1:(.N - 2)], id][]
#>     id val trt
#>  1:  1   3   1
#>  2:  1   5   1
#>  3:  1   4   1
#>  4:  2   4   1
#>  5:  2   1   1
#>  6:  2   6   1
#>  7:  2   7   1
#>  8:  3   2   2
#>  9:  3   0   2
#> 10:  3   3   2
#> 11:  3   6   2
#> 12:  3   8   2

Note that this will rearrange the data relative to the original if it is not already grouped by id.

Base R (if test is sorted by id):

test[sequence((n <- rle(test$id)[[1]]) - 2, c(1, cumsum(n[-length(n)]) + 1)),]
#>     id val trt
#>  1:  1   3   1
#>  2:  1   5   1
#>  3:  1   4   1
#>  4:  2   4   1
#>  5:  2   1   1
#>  6:  2   6   1
#>  7:  2   7   1
#>  8:  3   2   2
#>  9:  3   0   2
#> 10:  3   3   2
#> 11:  3   6   2
#> 12:  3   8   2