Search code examples
ranovaspread

Spread() error in anova_test(): how to make keys unique?


I'm trying to run a 2-way repeated measures ANOVA to look at condition and time effect on systolic BP. I'm using the anova_test() function but I'm getting the error:

Error in `spread()`:
! Each row of output must be identified by a unique combination of keys.
ℹ Keys are shared for 6 rows
• 14, 36
• 182, 204
• 98, 120

I'm unsure of why these are reading as non-unique?

> df_lbp[c(14,36),]
# A tibble: 2 × 10
  subject_id condition visit  time  syst  timef      conditionf
       <dbl>     <dbl> <dbl> <int> <dbl>  <fct>      <fct>     
1        129         0     1     1  106.  anticipate Control   
2        165         1     1     1  119   anticipate Stress    
> df_lbp[c(182, 204),]
# A tibble: 2 × 10
  subject_id condition visit  time  syst  timef    conditionf
       <dbl>     <dbl> <dbl> <int> <dbl>  <fct>    <fct>     
1        129         0     1     3  103.  recovery Control   
2        165         1     1     3  121.  recovery Stress    
> df_lbp[c(98, 120),]
# A tibble: 2 × 10
  subject_id condition visit  time  syst  timef conditionf
       <dbl>     <dbl> <dbl> <int> <dbl>  <fct> <fct>     
1        129         0     1     2  102.  task  Control   
2        165         1     1     2  128   task  Stress 

I'm curious what r is pulling from to use as keys, and I'd appreciate any help in getting this to work. My code and data are below.


a1 <- anova_test( data = df_lbp, dv = syst,
                  wid = subject_id, 
                  within = c(timef, conditionf) )

get_anova_table(a1)

 dput(df_lbp))


Solution

  • You observation with subject_id 161 has several entries (varying timef values):

    library(dplyr)
    
    df_lbp |>
      count(subject_id, timef, conditionf) |>
      filter(n > 1)
    

    output:

    # A tibble: 3 x 4
      subject_id timef      conditionf     n
           <dbl> <fct>      <fct>      <int>
    1        161 anticipate Stress         2
    2        161 task       Stress         2
    3        161 recovery   Stress         2
    

    ... without these duplicates, anova_test runs OK:

    df_lbp |>
      filter(subject_id != 161) |>
      rstatix::anova_test(dv = syst,
                          wid = subject_id, 
                          within = c(timef, conditionf)
                          )
    

    output:

    + ANOVA Table (type III tests)
    
    $ANOVA
                Effect DFn DFd      F        p p<.05   ges
    1            timef   2  46 38.931 1.28e-10     * 0.076
    2       conditionf   1  23 28.937 1.83e-05     * 0.284
    3 timef:conditionf   2  46 37.078 2.57e-10     * 0.087
    ## etc.
    

    edit as r2evans pointed out, you can keep distinct combinations of variables (instead of checking first and singling them out) like so (note that the first/topmost observation of any duplicate is kept):

    df_lbp |>
      distinct(subject_id, timef, conditionf,
               .keep_all = TRUE
               ) |>
      rstatix::anova_test(dv = syst,
                          wid = subject_id, 
                          within = c(timef, conditionf)
                          )