Search code examples
rsubset

Getting error about logical vs. double vector when subsetting


In R, I used Rosner's test EnvStats::rosnerTest to identify outliers in my dataset. My end goal is to have a vector of row numbers in my dataset where Outlier = TRUE.

var.ros.test <- rosnerTest(df$var, k = 20)
class(var.ros.test)
[1] "gofOutlier"
print(var.ros.test$all.stats)

This is the output of var.ros.test$all.stats. I highlighted the Obs.Num that I want to save in a vector when Outlier = TRUE.

enter image description here

I started this code but I am stuck because this code returns all Obs.Num, when I only want Obs.Num when Outlier = TRUE.

var.out <- subset(var.ros.test$all.stats, select = "Obs.Num")
print(var.out)
      Obs.Num
1      977
2      91
3      384
4      97
5      281
.
.
.
> class(var.out)
[1] "data.frame"

> var.out <- var.out[,1]
> var.out <- as.vector(var.out) %>% unlist(var.out)
> print(var.out)
[1]  977   91  384   97  281   65  512  331    6 1041   39
[12]    2  147   69  856  133  329  577 1017  104

Which would be fine, except this contains Row.Obs where Outlier = FALSE.

Previously, I tried doing this:

var.out <- subset(var.ros.test$all.stats, select = "Obs.Num") %>%
filter(var.ros.test$all.stats, Outlier == "TRUE")

But I get "Error in filter(): ℹ In argument: var.ros.test$all.stats. Caused by error:! ..1$i must be a logical vector, not a double vector."

I would greatly appreciate any tips on how to get a vector of Row.Obs when Outlier = TRUE. Thank you so much!


Solution

  • Not sure if this is the only problem, but you're using a character string "TRUE" for filtering. I assume Outlier is a logical boolean vector.

    If this is your data

    set.seed(42)
    
    df <- data.frame(dat = 1:10, 
                     sam = sample(10), 
                     Outlier = sample(c(T,F), 10, replace=T))
    
    df
       dat sam Outlier
    1    1   1    TRUE
    2    2   5   FALSE
    3    3  10    TRUE
    4    4   8   FALSE
    5    5   2    TRUE
    6    6   4    TRUE
    7    7   6   FALSE
    8    8   9   FALSE
    9    9   7   FALSE
    10  10   3   FALSE
    

    with Outlier s class being

    class(df$Outlier)
    [1] "logical"
    

    then filter only needs the variable, since it's already a logical boolean.

    df %>% filter(Outlier)
      dat sam Outlier
    1   1   1    TRUE
    2   3  10    TRUE
    3   5   2    TRUE
    4   6   4    TRUE