Search code examples
rdataframedata.tablesubset

Selection with a filter on row number and value


I have the following simple data.table "test". I would like to select all rows of row 3 to 8 with X equal to "A":

library(data.table)
set.seed(1)
test <- data.table(X=c(rep("A",5),rep("B",5)),Y=rnorm(10),Z=rnorm(10))

test[3:8 & X == "A"] # gives the not desired output:

1: A -0.6264538  1.5117812
2: A  0.1836433  0.3898432
3: A -0.8356286 -0.6212406
4: A  1.5952808 -2.2146999
5: A  0.3295078  1.1249309
Warning message:
  In 3:8 & X == "A" :
  longer object length is not a multiple of shorter object length

# desired outcome:

3: A -0.8356286 -0.62124058
4: A  1.5952808 -2.21469989
5: A  0.3295078  1.12493092

Between row 3:8, I would like to select just the ones with X == "A". How is this possible? Please note that using test[3:8][X == "A"] seems not as an option, because I want to do some calculations on these rows which are saved in the original datatable.


Solution

  • Here 3:8 is definitely not of the same length as the second expression (X == "A") and more over, we are compare a logical index with a numeric index. Instead, convert the first expression to logical by using %in% on the sequence of rows, then two things happen - 1) lengths become same, 2) Same type

    test[(seq_len(.N) %in% 3:8) & X == "A"]
    #    X          Y          Z
    #1: A -0.8356286 -0.6212406
    #2: A  1.5952808 -2.2146999
    #3: A  0.3295078  1.1249309