Search code examples
rnamissing-datarstatix

I have no NAs in my dataset but I still get the error missing value where TRUE/FALSE needed?


I want to calculate the effect size of my variables. I am getting ther error "missing value wher TRUE/FALSE needed" even though I purged my data.frame of NAs before. Any idea why this is happening?

I am using the cohens_d() function of rstatix . R version 4.2.2 (2022-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)

My data.frame looks like this:

structure(list(y = c(7.18497519069826, 7.3003780648707, 7.17955179116519, 
8.36921585741014, 8.15836249209525, 7.09061070782841, 7.49108141342319, 
7.1846914308176, 6.67089495352021, 6.69143515214406, 6.42357351973274, 
7.52608069180203, 7.24501887073775, 6.85901814388889, 7.57170883180869, 
7.33425264233423, 8.04921802267018, 7.03181227133037, 7.59494473669508, 
7.19479175772192, 7.50365451924296, 7.98766626492627, 7.69670578093392, 
7.60357736815147, 6.96018527660461, 6.87390159786446, 7.06818586174616, 
7.73303668293358, 7.00902574208691, 7.43980621139333, 7.21563756343506, 
7.28869626059026, 7.16435285578444, 8.40397796366936, 8.11092624226642, 
6.87139778148748, 7.28510702956681, 7.28533222764388, 7.09131515969722, 
6.75541746281094, 7.48515334990365, 7.04727486738418, 7.05153839051533, 
6.94610823043691, 6.88677264305444, 7.17522180034305, 8.01535975540921, 
6.97657921864011, 7.44994098877334, 7.24328614608345, 6.94987770403687, 
7.0265332645233, 7.03662889536216, 6.7070589406276, 7.44075170047919, 
6.58972625625424, 6.75913881628117, 7.41597441137657, 7.57460994134019
), x = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L
), levels = c("untreated", "VRZ", "AMB", "untreated_107"), class = "factor")), row.names = c(NA, 
-59L), class = c("tbl_df", "tbl", "data.frame"), na.action = structure(c(`58` = 58L), class = "omit"))

r_test %>%
  cohens_d (y~ x) %>%  
  as.data.frame()

Any idea what the problem is?

Similarly, when I tried to use the function wilcox_effsize() instead, R returns the following error: "can't deal with factors containing only one level"

When I used this very similar data-frame the analysis worked even though iut contained NAs

structure(list(y = c(9.91e+08, 8.17e+08, 461200000, 15330000, 
175100000, 50320000, 13590000, 22970000, 2778000, 3453000, 12890000, 
375900000, 44590000, 1.611e+09, 1e+09, 889900000, 373200000, 
NA, NA, NA, NA, NA, 5010000, 6549000, 23160000, 32520000, 7707000, 
556900000, 634600000, 820900000, 391400000, 498300000, 147900000, 
646900000, 22060000, 1e+07, 306800000, 319400000, 41290000, 94100000, 
127200000, 117200000, 618300000, 570700000, 617100000, 284900000, 
449600000, 3866000, 6918000, 4177000, 14870000, 29380000, 2815000, 
1619000, 3126000, 1710000, 2191000), x = structure(c(1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L), levels = c("untreated", "VRZ", "AMB"
), class = "factor")), row.names = c(NA, -57L), class = c("tbl_df", 
"tbl", "data.frame"))

Solution

  • EDIT:

    The problem is that there is one unused factor level, namely untreated_107. There are several ways to deal with this situation:

    Use droplevels from base R:

    library(rstatix)
    library(tidyverse)
    r_test %>%
      mutate(x = droplevels(x))%>%
      cohens_d(y ~ x) %>%  
      as.data.frame()
      .y.    group1    group2    effsize n1 n2 magnitude
    1   y       AMB untreated -1.1805582 19 20     large
    2   y       AMB       VRZ -0.4735816 19 20     small
    3   y untreated       VRZ  0.6551090 20 20  moderate
    

    With fct_drop from forcats:

    library(forcats)
    library(rstatix)
    library(tidyverse)
    r_test %>%
      mutate(x = droplevels(x))%>%
      cohens_d(y ~ x) %>%  
      as.data.frame()
    

    Or, to circumvent the problem of the missing factor level altogether, by converting x to character (but conceptually questionable, as x may/will be factor for a reason):

    library(rstatix)
    library(tidyverse)
    r_test %>%
      mutate(x = as.character(x)) %>%
      cohens_d(y ~ x) %>%  
      as.data.frame()