Search code examples
routputregressiondummy-variableplm

Dummies not included in summary


I want to create a function which will perform panel regression with 3-level dummies included.

Let's consider within model with time effects :

library(plm)

fit_panel_lr <- function(y, x) {
  x[, length(x) + 1] <- y
 
  #adding dummies
  mtx <- matrix(0, nrow = nrow(x), ncol = 3)
  mtx[cbind(seq_len(nrow(mtx)), 1 + (as.integer(unlist(x[, 2])) - min(as.integer(unlist(x[, 2])))) %% 3)] <- 1
  colnames(mtx) <- paste0("dummy_", 1:3)
  #converting to pdataframe and adding dummy variables
  x <- pdata.frame(x)
  x <- cbind(x, mtx)

  #performing panel regression 
  varnames <- names(x)[3:(length(x))]
  varnames <- varnames[!(varnames == names(y))]
  form     <- paste0(varnames, collapse = "+")
  x_copy   <- data.frame(x)
  form     <- as.formula(paste0(names(y), "~", form,'-1'))
  params   <- list(
    formula = form, data = x_copy, model = "within",
    effect = "time"
  )
  pglm_env <- list2env(params, envir = new.env())

  model_plm <- do.call("plm", params, envir = pglm_env)

  model_plm
}

However, if I use data :

data("EmplUK", package="plm")
dep_var<-EmplUK['capital']
df1<-EmplUK[-6]

In output I will get :

>  fit_panel_lr(dep_var, df1)

Model Formula: capital ~ sector + emp + wage + output + dummy_1 + dummy_2 + 
    dummy_3 - 1
<environment: 0x000001ff7d92a3c8>

Coefficients:
   sector       emp      wage    output 
-0.055179  0.328922  0.102250 -0.002912 

How come that in formula dummies are considered and in coefficients are not ? Is there any rational explanation or I did something wrong ?


Solution

  • One point why you do not see the dummies on the output is because they are linear dependent to the other data after the fixed-effect time transformation. They are dropped so what is estimable is estimated and output.

    Find below some (not readily executable) code picking up your example from above:

    dat <- cbind(EmplUK, mtx) # mtx being the dummy matrix constructed in your question's code for this data set
    pdat <- pdata.frame(dat)
    rhs <- paste(c("emp", "wage", "output", "dummy_1", "dummy_2", "dummy_3"), collapse = "+")
    form <- paste("capital ~" , rhs)
    form <- formula(form)
    mod <- plm(form, data = pdat, model = "within", effect = "time")
    detect.lindep(mod$model) # before FE time transformation (original data) -> nothing offending
    detect.lindep(model.matrix(mod)) # after FE time transformation -> dummies are offending
    

    The help page for detect.lindep (?detect.lindep is included in package plm) has some more nice examples on linear dependence before and after FE transformation.

    A suggestion: As for constructing dummy variables, I suggest to use R's factor with three levels and not have the dummy matrix constructed yourself. Using a factor is typically more convinient and less error prone. It is converted to the binary dummies (treatment style) by your typical estimation function using the model.frame/model.matrix framework.