Search code examples
rr-packageoutliersrstatix

Need to modify identify_outliers function in rstatix, but modified function is throwing a strange error


I am trying to modify the identify_outliers function in rstatix package to allow for any coefficient when determining outliers in the is_outlier function. Here is the code for identify_outliers:

function (data, ..., variable = NULL) 
{
    is.outlier <- NULL
    if (is_grouped_df(data)) {
        results <- data %>% doo(identify_outliers, ..., variable = variable)
        if (nrow(results) == 0) 
            results <- as.data.frame(results)
        return(results)
    }
    if (!inherits(data, "data.frame")) 
        stop("data should be a data frame")
    variable <- data %>% get_selected_vars(..., vars = variable)
    n.vars <- length(variable)
    if (n.vars > 1) 
        stop("Specify only one variable")
    values <- data %>% pull(!!variable)
    results <- data %>% mutate(is.outlier = is_outlier(values), 
        is.extreme = is_extreme(values)) %>% filter(is.outlier == 
        TRUE)
    if (nrow(results) == 0) 
        results <- as.data.frame(results)
    results
}

Here I've created a function called crazy_outliers by modifying identify_outliers. I've removed the parts pertaining to is_extreme as I don't need that portion, and I've added an argument y to allow for input of a coefficient into the is_outlier function:

crazy_outliers <- function (data, ..., variable = NULL, y) #added y argument
{
    is.outlier <- NULL
    if (is_grouped_df(data)) {
        results <- data %>% doo(crazy_outliers, ..., variable = variable, y = y) # changed identify_outliers to crazy_outliers and added y argument
        if (nrow(results) == 0) 
            results <- as.data.frame(results)
        return(results)
    }
    if (!inherits(data, "data.frame")) 
        stop("data should be a data frame")
    variable <- data %>% get_selected_vars(..., vars = variable)
    n.vars <- length(variable)
    if (n.vars > 1) 
        stop("Specify only one variable")
    values <- data %>% pull(!!variable)
    results <- data %>% mutate(is.outlier = is_outlier(values, coef = y)) #Here I utilize the y argument to specify the coefficient for determining outliers
    if (nrow(results) == 0) 
        results <- as.data.frame(results)
    results
}

However, I receive the following error when trying to use the function:

Error in `mutate()`:
ℹ In argument: `data = map(.data$data, .f, ...)`.
Caused by error in `map()`:
ℹ In index: 1.
Caused by error in `get_selected_vars()`:
! could not find function "get_selected_vars"

I haven't even modified the get_selected_vars() function or its arguments at all, and it exists in the original identify_outliers function, so I'm confused as to what's going on. I also cannot find what package it is from, as when I replace it with rstatix::get_selected_vars I still cannot get the function to work. Any advice is appreciated, thank you!


Solution

  • get_selected_vars is an unexported utlity function from rstatix. Functions can be defined and used in packages, but will not be made available to users of the package unless explicitly exported in the NAMESPACE. You are presumably writing your crazy_outliers function in an R script or notebook, not editing and loading the package itself, so it will not have access to get_selected_vars. You can access it directly by using :::, e.g. rstatix:::get_selected_vars(), but this is risky since packages may change how utility functions are defined with little notice. Alternatively, you can inline your own version of the function.