Search code examples
rlogistic-regressionglm

Using glm to predict continuous variables between 0 and 1 family=binomial(link='logit') gives error


I'm trying to use glm to estimate a logistic regression on a continuous variable between 0 and 1 using the following code, but am getting the attached error:

> glm(y ~ x, data=test_data, family=binomial(link = 'logit'))
Error in eval(family$initialize) : y values must be 0 <= y <= 1

However, when I do a summary on test_data, the df has y values that are entirely between 0 and 1...

> summary(test_data)
       y                  x         
 Min.   :0.000000   Min.   :0.0000  
 1st Qu.:0.001510   1st Qu.:0.0000  
 Median :0.003664   Median :1.0000  
 Mean   :0.025847   Mean   :0.5386  
 3rd Qu.:0.009054   3rd Qu.:1.0000  
 Max.   :1.000000   Max.   :1.0000

Can anyone help me understand what the issue here is? If I check the type of the variables, they are both numeric:

> class(test_data$y)
[1] "numeric"
> class(test_data$x)
[1] "numeric"

Solution

  • Suggest you try:

    which(as.numeric(test_data$x) < 0 | as.numeric(test_data$x) > 1)
    which(as.numeric(test_data$y) < 0 | as.numeric(test_data$y) > 1)