Search code examples
rcorrelationcontinuouscategorical-data

How to use biserial to calculate correlation between continuous and categorical variable?


I am trying to calculate the correlation between x (continuous variable) and y (categorical variable) in R.

The function biserial in the psych package is used to calculate this. See here.

But when I actually used it, I got a warning message and NA as the correlation:

Warning message:
In biserialc(x[, j], y[, i], j, i) : For x = 1 y = 1 y is not dichotomous

Does anyone actually use this function and get the right results?

UPDATE:

Here is the reproducible code:

 library(psych)
x=c(5,3,4,8,7,7,4,9,6,8,11,5,1,4,4,9,5,9,10,2,9,3,6,9,3,9,7,14,7,6,8,10,6,10,2,8,6,4,12,11,1,8,7,7,12,6,5,6,8,9)
y=c(2,3,2,1,1,1,1,1,1,1,1,1,2,1,1,1,1,2,1,1,1,1,1,1,1,1,2,3,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,3,1,1,1,1,1)
 biserial(x,y)

The output is

 Biserial |                                                                                                            |   0%
       [,1]
 [1,]   NA
 Warning message:
In biserialc(x[, j], y[, i], j, i) : For x = 1 y = 1 y is not dichotomous

Thanks!


Solution

  • Since y is not dichotomous, it doesn't make sense to use biserial(). From the documentation:

    The biserial correlation is between a continuous y variable and a dichotmous x variable, which is assumed to have resulted from a dichotomized normal variable.

    Instead use polyserial(), which allows more than 2 levels.

    polyserial() requires matrices as input, so structure your command like this:

    > polyserial(as.matrix(x), as.matrix(y))
              [,1]
    [1,] 0.2672098