I am trying to calculate the correlation between x
(continuous variable) and y
(categorical variable) in R.
The function biserial
in the psych
package is used to calculate this. See here.
But when I actually used it, I got a warning message and NA as the correlation:
Warning message:
In biserialc(x[, j], y[, i], j, i) : For x = 1 y = 1 y is not dichotomous
Does anyone actually use this function and get the right results?
UPDATE:
Here is the reproducible code:
library(psych)
x=c(5,3,4,8,7,7,4,9,6,8,11,5,1,4,4,9,5,9,10,2,9,3,6,9,3,9,7,14,7,6,8,10,6,10,2,8,6,4,12,11,1,8,7,7,12,6,5,6,8,9)
y=c(2,3,2,1,1,1,1,1,1,1,1,1,2,1,1,1,1,2,1,1,1,1,1,1,1,1,2,3,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,3,1,1,1,1,1)
biserial(x,y)
The output is
Biserial | | 0%
[,1]
[1,] NA
Warning message:
In biserialc(x[, j], y[, i], j, i) : For x = 1 y = 1 y is not dichotomous
Thanks!
Since y
is not dichotomous, it doesn't make sense to use biserial()
. From the documentation:
The biserial correlation is between a continuous y variable and a dichotmous x variable, which is assumed to have resulted from a dichotomized normal variable.
Instead use polyserial()
, which allows more than 2 levels.
polyserial()
requires matrices as input, so structure your command like this:
> polyserial(as.matrix(x), as.matrix(y))
[,1]
[1,] 0.2672098