Search code examples
rlogistic-regressionpanel-data

Logistic Unit Fixed Effect Model in R


I'm trying to estimate a logistic unit fixed effects model for panel data using R. My dependent variable is binary and measured daily over two years for 13 locations. The goal of this model is to predict the value of y for a particular day and location based on x.

zero <- seq(from=0, to=1, by=1)
ids = dplyr::data_frame(location=seq(from=1, to=13, by=1))
dates = dplyr::data_frame(date = seq(as.Date("2015-01-01"), as.Date("2016-12-31"), by="days"))
data = merge(dates, ids)
data$y <- sample(zero, size=9503, replace=TRUE)
data$x <- sample(zero, size=9503, replace=TRUE)

While surveying the available packages to do so, I've read a number of ways to (apparently) do this, but I'm not confident I've understood the differences between packages and approaches.

From what I have read so far, glm(), survival::clogit() and pglm::pglm() can be used to do this, but I'm wondering if there are substantial differences between the packages and what those might be. Here are the calls I've used: fixed <- glm(y ~ x + factor(location), data=data) fixed <- clogit(y ~ x + strata(location), data=data)

One of the reasons for this insecurity is the error I get when using pglm (also see this question) that pglm can't use the "within" model: fixed <- pglm(y ~ x, data=data, index=c("location", "date"), model="within", family=binomial("logit")).

What distinguishes the "within" model of pglm from the approaches in glm() and clogit() and which of the three would be the correct one to take here when trying to predict y for a given date and unit?


Solution

  • I don't see that you have defined a proper hypothesis to test within the context of what you are calling "panel data", but as far as getting glm to give estimates for logistic coefficients within strata it can be accomplished by adding family="binomial" and stratifying by your "unit" variable:

    > fixed <- glm(y ~ x + strata(unit), data=data, family="binomial")
    > fixed
    
    Call:  glm(formula = y ~ x + strata(unit), family = "binomial", data = data)
    
    Coefficients:
            (Intercept)                    x   strata(unit)unit=2   strata(unit)unit=3  
                0.10287             -0.05910             -0.08302             -0.03020  
     strata(unit)unit=4   strata(unit)unit=5   strata(unit)unit=6   strata(unit)unit=7  
               -0.06876             -0.05042             -0.10200             -0.09871  
     strata(unit)unit=8   strata(unit)unit=9  strata(unit)unit=10  strata(unit)unit=11  
               -0.09702              0.02742             -0.13246             -0.04816  
    strata(unit)unit=12  strata(unit)unit=13  
               -0.11449             -0.16986  
    
    Degrees of Freedom: 9502 Total (i.e. Null);  9489 Residual
    Null Deviance:      13170 
    Residual Deviance: 13170    AIC: 13190
    

    That will not take into account any date-ordering, which is what I would have expected to be the interest. But as I said above, there doesn't yet appear to be a hypothesis that is premised on any sequential ordering.

    This would create a fixed effects model that included a spline relationship of date to probability of y-event. I chose to center the date rather than leaving it as a very large integer:

    library(splines)
    fixed <- glm(y ~ x + ns(scale(date),3) + factor(unit), data=data, family="binomial")
    fixed
    #----------------------
    Call:  glm(formula = y ~ x + ns(scale(date), 3) + factor(unit), family = "binomial", 
        data = data)
    
    Coefficients:
            (Intercept)                    x  ns(scale(date), 3)1  ns(scale(date), 3)2  
                0.13389             -0.05904              0.04431             -0.10727  
    ns(scale(date), 3)3        factor(unit)2        factor(unit)3        factor(unit)4  
               -0.03224             -0.08302             -0.03020             -0.06877  
          factor(unit)5        factor(unit)6        factor(unit)7        factor(unit)8  
               -0.05042             -0.10201             -0.09872             -0.09702  
          factor(unit)9       factor(unit)10       factor(unit)11       factor(unit)12  
                0.02742             -0.13246             -0.04816             -0.11450  
         factor(unit)13  
               -0.16987  
    
    Degrees of Freedom: 9502 Total (i.e. Null);  9486 Residual
    Null Deviance:      13170 
    Residual Deviance: 13160    AIC: 13200