I want to test for sexual dependency in my data set, which consists of ordinal data. This means, that I have the sexes male (named as 1) and female (named as 2), and several traits (T1, T2, T3,...) of different ordinal scale (some ranging from 0-2, others ranging from 0-5 - or in words from "not present" to "strongly expressed"). Additionally, there are quite a few missing entries (NA) in the ordinal trait data.
sex | T1 |
---|---|
1 | 0 |
2 | 2 |
1 | NA |
2 | 1 |
2 | 0 |
To test for sexual dependency, I want to use Kendall's tau coefficient. For this, I used cor()
and cor.test()
with method = "kendall"
. However, I am not sure if I did it correctly. The outcome of cor()
makes me feel insecure:
cor(data$sex, data$T1, method="kendall")
[1] NA
cor.test(data$sex, data$T1, method="kendall")
Kendall's rank correlation tau
data: data$sex and data$T1
z = 0.052821, p-value = 0.9579
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau
0.0120125
What does the NA mean? And is the result still reliable? Or did I make a mistake? Are there any other suggestions to test for sexual dependency in ordinal traits? Normally in such a study design, the ordinal data would have been dichotomized (0 and 1) and Fisher's Exact Test would have been used. However, dichotomizing is not my aim and I need to retain the ordinal scale.
As mentioned by the other comments/answers, the base R correlation function is a vector-based function that will automatically pass NA values into the correlation, thus making it only display NA values. There are a couple ways around this shown below. First, I recreated your data:
#### Recreate Data ####
sex <- c(1,2,1,2,2)
t1 <- c(0,2,NA,1,0)
df <- data.frame(sex,t1)
df
Then using the "complete.obs" argument, you can get the Kendall correlation without the NA values:
#### Base R Method ####
cor(sex,
t1,
use = "complete.obs",
method = "kendall")
Shown below:
[1] 0.5163978
Additionally you can use the correlation
package from the same-named library, which automatically throws out NA values:
#### Correlation Package ####
correlation::correlation(df, method = "kendall")
Shown below:
# Correlation Matrix (kendall-method)
Parameter1 | Parameter2 | tau | 95% CI | z | p
-------------------------------------------------------------
sex | t1 | 0.52 | [-1.00, 1.00] | 0.94 | 0.346
p-value adjustment method: Holm (1979)
Observations: 4
The advantages of this function are 1) you can use a dplyr
workflow to select, filter, etc. and apply this function after 2) it has a self-contained table with your CIs, t values, p values, etc. 3) it highlights how many observations were used, which the base R function does not say.