Search code examples
rfunctiontidyversesubsettidy

Error when creating function: 'recursive indexing failed'


I'm trying to create a function that, when given a data frame and a column, uses Rosner's test (EnvStats::rosnerTest) to identify the outliers and returns a new data frame so I can inspect each outlier.

I am able to achieve this without using a function, but because I have a data frame with many variables, I would like to create a function to automate this more quickly. (My previous post shows the workflow for doing this one variable at a time.)

Here are my data:

> dput(head(data))
structure(list(cap_date = structure(c(4856, 4860, 4860, 4861, 
4866, 4867), class = "Date"), cap_year = c(1983L, 1983L, 1983L, 
1983L, 1983L, 1983L), age_class = c("A", "S", "S", "A", "A", "A"), sex = 
c("F", "F", "F", "F", "F", "F"), alt = c(11, 12, 15.67000008, 7, 14.5, 
17.5), alb = c(2.599999905, 5.369999886, 4.670000076, 4.429999828, 3.75, 
3.700000048), alp = c(9, 86.33000183, 28, 170.6699982, 12, 82.5), 
tbil = c(0.200000003, 1.070000052, 0.430000007, 1.169999957, 
0.300000012, 0.400000006), bun = c(20, 17, 11.32999992, 56.33000183, 
7.5, 45), calcium = c(NA, 8.930000305, 8.800000191, 8.970000267, NA, 
7.550000191), crea = c(0.5, 0.569999993, 0.529999971, 0.600000024, 
1.049999952, 0.75), phos = c(2.75, 4.099999905, 4.96999979, 
5.329999924, 4.099999905, 7.400000095), pot = c(5.550000191, 
6.730000019, 3.869999886, 4.269999981, 3.049999952, 6.849999905), tp 
= c(4.449999809, 6.769999981, 5.800000191, 6.769999981, 5.75, 
6.400000095), sodium = c(NA, 142, 127, 138.3300018, 164, 139), glob = 
c(1.849999905, 1.400000095, 1.130000114, 2.340000153, 2, 
2.700000048), cortisol = c(4.24, 7.2231, 4.5431, NA, 6.0874, 4.8727), 
row = c(1L, 2L, 3L, 4L, 6L, 7L)), row.names = c(1L, 2L, 3L, 4L, 6L, 
7L), class = "data.frame")

Here is my code:

library("EnvStats")
library("dplyr")
detect.outlier <- function(df, i, k) {  # i is a column/variable, and k is an input in the Rosner test

  plot(df$year, df[[i]], xlab = "Year", ylab = "Value") # I also want to print the plot

  ros.test <- rosnerTest(df[[i]], k)

  ros.results <- ros.test$all.stats

  ros.outliers <- ros.results %>% filter(Outlier) %>% select(Obs.Num) # filter by outlier = TRUE ; Obs.Num corresponds with row number in my data frame

  ros.outliers <- ros.outliers[,1]  # change from a data frame to a vector 

  outlier_df <- df[df$row %in% ros.outliers,]

  return(outlier_df %>% select(age_class, sex, i))

}

I try to run the function:

detect.outlier(data, alt, 20)

But I get an error:

Error during wrapup: recursive indexing failed at level 2

Error: no more error handlers available (recursive errors?); invoking 'abort' restart

I'm not sure what this means or how to fix it - any help would be greatly appreciated. Thank you so much!

Edit: Sometimes when I run the function I also get this error:

Error in rosnerTest(data$variable, k) : 'x' must be a numeric vector

Which seems weird because when I do class(data$alt) it says it is numeric.

EDIT: Yama's solution is correct. I was checking the code to make sure it was returning the correct outliers, and it seems like the Rosner test returns an "Obs.Num" that is different than row numbers. Here is an example using deconstructed code from the function:

> ros.test <- rosnerTest(df$crea, k = 10)

Warning message:
In rosnerTest(df$crea, k = 10) :
3 observations with NA/NaN/Inf in 'x' removed.

> ros.results <- ros.test$all.stats

> print(ros.results)
   i   Mean.i      SD.i Value Obs.Num    R.i+1 lambda.i+1 Outlier
1  0 1.078450 0.3102488  3.35     222 7.321705   4.053146    TRUE
2  1 1.076295 0.3023919  2.55      12 4.873495   4.052913    TRUE
3  2 1.074895 0.2991009  2.35    1047 4.263125   4.052680    TRUE
4  3 1.073683 0.2966446  2.30     877 4.133960   4.052447    TRUE
5  4 1.072516 0.2943607  2.10     801 3.490560   4.052214   FALSE
6  5 1.071538 0.2927857  2.00     293 3.171133   4.051980   FALSE
7  6 1.070653 0.2915166  2.00     373 3.187974   4.051746   FALSE
8  7 1.069766 0.2902367  1.95     633 3.032814   4.051512   FALSE
9  8 1.068925 0.2890959  1.90     103 2.874737   4.051278   FALSE
10 9 1.068131 0.2880883  1.85     548 2.713992   4.051043   FALSE

> # 4 outliers flagged - obs. num 222, 12, 1047, and 877

> crea <- df[c(222, 12, 1047, 877),]

> crea %>% select(age_class, sex, crea, row)

     age_class sex crea  row
236          A   F 3.35  236
13           A   M 2.55   13
1154         A   M 2.35 1154
969          A   M 2.30  969

Here we see the row numbers changed from 222 to 236, from 12 to 13, 1047 to 1154, and 877 to 969

This ends up having implications down the line in my function line

outlier_df <- df[df$row %in% ros.outliers,]

Because it then indexes the wrong row numbers.

Any help is super appreciated!!


Solution

  • Your function looks for the variable i as you give it. When you call your function with detect.outlier(data, alt, 20), i has the value alt. So in your function detect.outlier() the code that is executed is plot(df$year, df[[alt]], xlab = "Year", ylab = "Value") when it should be plot(df$year, df[["alt"]], xlab = "Year", ylab = "Value").

    You can correct that by writing detect.outlier(df, "alt", 20).

    You apparently have another problem in your code :

    Error in xy.coords(x, y, xlabel, ylabel, log) :
    'x' and 'y' lengths differ
    

    But that should help you already.

    EDIT: you should provide the package name for rosnerTest function.