Search code examples
rstatisticsanovarstatix

Problem with 'mutate()' input 'data' in ANOVA (rstatix)


This is driving me crazy. I am using anova_test from rstatix and it's telling me that my columns aren't there when they clearly are.

This is what my dataframe looks like:

ID = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3) 
Form = c("A", "A", "A", "B", "B", "B", "A", "A", "A", "B", "B", "B", "A", "A", "A", "B", "B", "B")
Pen = c("Red", "Blue", "Green", "Red", "Blue", "Green", "Red", "Blue", "Green","Red", "Blue", "Green","Red", "Blue", "Green","Red", "Blue", "Green")
Time = c(20, 4, 6, 2, 76, 3, 86, 35, 74, 94, 14, 35, 63, 12, 15, 73, 87, 33)
df <- data.frame(ID, Form, Pen, Time)

ID, Form, and Pen are factors, while Time is numeric. So each subject completed forms A and B with Red, Blue, and Green pens, and I measured how long each took in completing the form.

This is a fake dataset that I've purposefully come up with to ask this question. In reality, this dataframe is derived from a larger dataset with several more variables. Each variable has a lot more observations (so not just one datapoint for subject 1 & Form A & Red Pen, as in this example, but multiple), so I've summarized them to get mean Time.

df <- original.df %>% dplyr::select(ID, Form, Pen, Time)
df <- df %>% dplyr::group_by(ID, Form, Pen) %>% dplyr::summarise(Time = mean(Time))
df <- df %>% convert_as_factor(ID, Form, Pen)
df$Time <- as.numeric(df$Time)

I wanted to test the main and interaction effects, so I'm doing a 2 by 3 repeated measures ANOVA (a two-way ANOVA, because Form and Pen are two independent variables).

aov <- rstatix::anova_test(data = df, dv = Time, wid = ID, within = c(Form, Pen))

and I KEEP getting this error:

Error: Problem with `mutate()` input `data`.
x Can't subset columns that don't exist.
x Columns `ID` and `Form` don't exist.
ℹ Input `data` is `map(.data$data, .f, ...)`.

WHY?! Any help would be greatly appreciated. I've been searching solutions for HOURS and I'm getting pretty frustrated.


Solution

  • Thank you for adding the additional details to the post - based on what you've provided it looks like you need to ungroup your df before passing it to anova_test(), e.g.

    #install.packages("rstatix")
    library(rstatix)
    library(tidyverse)
    
    ID = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3) 
    Form = c("A", "A", "A", "B", "B", "B", "A", "A", "A", "B", "B", "B", "A", "A", "A", "B", "B", "B")
    Pen = c("Red", "Blue", "Green", "Red", "Blue", "Green", "Red", "Blue", "Green","Red", "Blue", "Green","Red", "Blue", "Green","Red", "Blue", "Green")
    Time = c(20, 4, 6, 2, 76, 3, 86, 35, 74, 94, 14, 35, 63, 12, 15, 73, 87, 33)
    original.df <- data.frame(ID, Form, Pen, Time)
    
    df <- original.df %>%
      dplyr::select(ID, Form, Pen, Time)
    df <- df %>%
      dplyr::group_by(ID, Form, Pen) %>%
      dplyr::summarise(Time = mean(Time))
    df <- df %>%
      convert_as_factor(ID, Form, Pen)
    df$Time <- as.numeric(df$Time)
    df <- ungroup(df)
    
    aov <- rstatix::anova_test(data = df, dv = Time, wid = ID, within = c(Form, Pen))
    

    You can see whether a dataframe is grouped using str(), e.g. str(df) before and after ungrouped() shows you the difference. Please let me know if you are still getting errors after making this change