Search code examples
rreshapetidytext

R: Opposite to aggregate using tidytext::unnest_tokens. Multiple variables and upper case


Following up on this question, I want to perform a task opposite to aggregate (or the data.table equivalent as in the MWE below), so that I obtain df1 again, starting from df2.

The task here then is to reproduce df1 from df2. For this, I tried tidytext::unnest_tokens, but I cannot figure out how to make it work properly when more than one variable have to be "dis-aggregated" (models, countries, and years).

It would be nice to retain the original upper case of the variables as well.

Any elegant solution different from tidytext::unnest_tokens would be accepted! Thanks!

Here is the MWE:

####MWE
library(data.table)
library(tidytext)
df1 <- data.frame(brand=c(rep('A',4), rep('B',5), rep('C',3), rep('D',2),'E'),
                  model=c('A1','A1','A2','A3','B1','B2','B2','B2','B3','C1','C1','C2','D1','D2','E1'),
                  country=c('P','G','S','S','P','P','F','I','D','S','F','F','G','I','S'),
                  year=c(91,92,93,94,98,95,87,99,00,86,92,92,93,95,99))
df1
dd <- data.table(df1)
df2 <- as.data.frame(dd[, list(models=paste(model, collapse=' /// '),
                               countries=paste(country, collapse=' /// '),
                               years=paste(year, collapse=' /// ')),
                        by=list(brand=brand)])
df2
df1b <- df2 %>% 
  unnest_tokens(model, models, token = "regex", pattern = " /// ")
df1b
####

Solution

  • We can use separate_rows

    library(tidyverse)
    res <- df2 %>% 
             separate_rows(models, countries, years, convert = TRUE) %>%
             rename_all(funs(paste0(names(df1)))) %>% #just to make the column names same as df1
             mutate(year = as.numeric(year)) #convert to numeric to match df1 column type
    all.equal(res, df1 %>% 
                      mutate_at(2:3, as.character), check.attributes = FALSE )
    #[1] TRUE