Search code examples
rpsych

R describeby function subscript out of bounds error


I'm fairly new to R and I'm trying to get descriptive statistics grouped by multiple variables using the describeby function from the psych package.

Here's what I'm trying to run:

JL <- describeBy(df$JL, group=list(df$Time, df$Cohort, df$Gender), digits=3, skew=FALSE, mat=TRUE)

And I get the error message Error in `[<-`(`*tmp*`, var, group + 1, value = dim.names[[group]][[groupi]]) : subscript out of bounds

I only get this error message with my Gender variable (which is dichotomous in this datset). I'm able to run the code when I take out the mat=TRUE argument, and I see that it's generating groupings with NULL for Gender. I saw in other answers that this has something to do with the array being out of bounds but I'm not sure how to troubleshoot. Any advice is appreciated.

Thanks so much.


Solution

  • You could use dplyr, with some custom functions added.

    library(dplyr)
    
    se <- function(x) sd(x, na.rm=TRUE)/sqrt(length(na.omit(x)))
    rnge <- function(x) diff(range(x, na.rm=TRUE))
    
    group_by(df, Time, Cohort, Gender) %>%
      summarise_at(vars(JL), .funs=list(n=length, mean=mean, sd=sd, min=min, max=max, range=rnge, se=se)) %>% 
      as.data.frame()
    

    Using the mtcars dataset:

    group_by(mtcars, vs, am, cyl) %>%
      summarise_at(vars(mpg), .funs=list(n=length, mean=mean, sd=sd, min=min, max=max, range=rnge, se=se)) %>% as.data.frame()
    
      vs am cyl  n mean    sd  min  max range    se
    1  0  0   8 12 15.1 2.774 10.4 19.2   8.8 0.801
    2  0  1   4  1 26.0    NA 26.0 26.0   0.0    NA
    3  0  1   6  3 20.6 0.751 19.7 21.0   1.3 0.433
    4  0  1   8  2 15.4 0.566 15.0 15.8   0.8 0.400
    5  1  0   4  3 22.9 1.453 21.5 24.4   2.9 0.839
    6  1  0   6  4 19.1 1.632 17.8 21.4   3.6 0.816
    7  1  1   4  7 28.4 4.758 21.4 33.9  12.5 1.798
    

    Using the describBy function from the psych package returns your error:

    library(psych)
    describeBy(mtcars$mpg, group=list(mtcars$vs, mtcars$am, mtcars$cyl), digits=3, skew=FALSE, mat=TRUE)
    

    Error in [<-(*tmp*, var, group + 1, value = dim.names[[group]][[groupi]]) : subscript out of bounds

    Because not all combinations of the three groups exist in the data.

    with(mtcars,
         ftable(table(vs,am,cyl)))
    #      cyl  4  6  8
    #vs am             
    #0  0       0  0 12
    #   1       1  3  2
    #1  0       3  4  0
    #   1       7  0  0