Search code examples
ranovanormal-distribution

Issue with 'group_by' function when doing shapiro_test in R


I've asked this question previously with no luck, so here goes again:

My dataframe:

data.type <- c("DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA")
hour <- c(1,1,1,2,2,2,24,24,24,48,48,48,96,96,96,168,168,168,672,672,672,1,1,1,2,2,2,24,24,24,48,48,48,96,96,96,168,168,168,672,672,672)
zotu.count <- c(11,14,16,7,16,15,5,14,13,6,5,17,7,7,12,3,4,5,3,5,4,2,3,2,1,6,2,1,1,1,1,0,0,1,1,4,1,1,1,6,7,6)
id <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42)

I am trying to do a shapiro test to test for normality of my data using the following code and am being given the following error:

dataset %>% group_by(data.type, hour) %>% shapiro_test(zotu.count)

Error: Problem with `mutate()` column `data`.
ℹ `data = map(.data$data, .f, ...)`.
x Problem with `mutate()` column `data`.
ℹ `data = map(.data$data, .f, ...)`.
x all 'x' values are identical

This is very strange as it has worked before on another dataset with the same data structure but I have no idea why I'm getting this error now. I am very frustrated as I have scoured the internet for answers and have nothing. Anybody who might be able to help would be a godsend!

Thank you!


Solution

  • We could use an if/else condition for this - checking where there are more than one unique values in 'zotu.count' and apply the shapiro_test

    library(rstatix)
    library(dplyr)
    library(tidyr)
    dataset %>% 
      group_by(data.type, hour) %>%
      summarise(out = if(n_distinct(zotu.count) == 1) list(NA) 
        else list(shapiro_test(zotu.count)), .groups = 'drop') %>% 
      unnest(out)
    

    -output

    # A tibble: 14 × 5
       data.type  hour variable   statistic p.value
       <chr>     <dbl> <chr>          <dbl>   <dbl>
     1 DNA           1 zotu.count     0.987   0.780
     2 DNA           2 zotu.count     0.832   0.194
     3 DNA          24 zotu.count     0.832   0.194
     4 DNA          48 zotu.count     0.812   0.144
     5 DNA          96 zotu.count     0.75    0    
     6 DNA         168 zotu.count     1       1.00 
     7 DNA         672 zotu.count     1       1.00 
     8 RNA           1 zotu.count     0.75    0    
     9 RNA           2 zotu.count     0.893   0.363
    10 RNA          24 <NA>          NA      NA    
    11 RNA          48 zotu.count     0.75    0    
    12 RNA          96 zotu.count     0.75    0    
    13 RNA         168 <NA>          NA      NA    
    14 RNA         672 zotu.count     0.75    0    
    

    We may also filter out those groups that have only a single unique value

    dataset %>% 
       group_by(data.type, hour) %>% 
       filter(n_distinct(zotu.count) > 1) %>% 
       shapiro_test(zotu.count)
    # A tibble: 12 × 5
       data.type  hour variable   statistic     p
       <chr>     <dbl> <chr>          <dbl> <dbl>
     1 DNA           1 zotu.count     0.987 0.780
     2 DNA           2 zotu.count     0.832 0.194
     3 DNA          24 zotu.count     0.832 0.194
     4 DNA          48 zotu.count     0.812 0.144
     5 DNA          96 zotu.count     0.75  0    
     6 DNA         168 zotu.count     1     1.00 
     7 DNA         672 zotu.count     1     1.00 
     8 RNA           1 zotu.count     0.75  0    
     9 RNA           2 zotu.count     0.893 0.363
    10 RNA          48 zotu.count     0.75  0    
    11 RNA          96 zotu.count     0.75  0    
    12 RNA         672 zotu.count     0.75  0