Search code examples
rpaneldummy-variable

R transform dummies into factor variable


I have a panel dataset where the time and group variables were already converted to dummies. I want to reverse the transformation though back to a simple id and time variable.

Let's create a comparable data:

library(plm)
library(tidyverse)
library(fastDummies)
data(EmplUK)

EmplUK %>%
  select(-sector) %>% 
  dummy_cols(.data = .,select_columns = c("firm","year"),remove_selected_columns = TRUE,remove_first_dummy = TRUE) -> paneldata
head(paneldata)

So basically now all my dummy variables are firm_X and year_X and I would like to have a Year and Firm variable again. This is slightly complicated by the fact that Firm 1 and Year 1 does not exist as dummy (as they would not be needed in a regression model). I'm fine with this precise data missing (I can simply infer that the first Firm would be Firm 1 and the year would be Year 1976, which is one less than the smallest one).

Any ideas how to do this nicely? Ideally using tidyverse?


Solution

  • After some thinking, I figured it out and created a small function:

    getfactorback <- function(data,
                               groupdummyprefix,
                               timedummyprefix,
                               grouplabel,
                               timelabel,
                               firstgroup,
                               firsttime) {
      
      data %>% 
        mutate(newgroup = ifelse(rowSums(cur_data() %>% select(starts_with("id")))==1,0,1),
               newtime = ifelse(rowSums(cur_data() %>% select(starts_with("time")))==1,0,1)) %>%
        
        rename(!!paste0(groupdummyprefix,firstgroup):=newgroup,
               !!paste0(timedummyprefix,firsttime):=newtime) %>%
        
        
        pivot_longer(cols = starts_with(groupdummyprefix),names_to = grouplabel,names_prefix = groupdummyprefix) %>%
        filter(value == 1) %>%
        select(-value) %>%
        
        pivot_longer(cols = starts_with(timedummyprefix),names_to = timelabel,names_prefix = timedummyprefix) %>%
        filter(value == 1) %>%
        select(-value)  %>%
        
        mutate(across(.cols = c(all_of(grouplabel),all_of(timelabel)),factor)) %>%
        relocate(all_of(c(grouplabel,timelabel))) -> output
      
      return(output)
      
    }
    
    getfactorback(data = paneldata,
                  groupdummyprefix = "firm_",
                  grouplabel = "firm",
                  timedummyprefix = "year_",
                  timelabel = "year",
                  firstgroup = "1",
                  firsttime = 1976)