Search code examples
rstringtidyversedummy-variabledummy-data

Turn colum containing list into dummies


I have a dataframe with a list of (space-separated) years that I would like to turn into dummies for each year.

Consider the following toy data:

raw <- data.frame(textcol = c("case1", "case2", "case3"), years=c('1996 1997 1998','1997 1999 2000', '1996 1998 2000'))


  textcol          years
1   case1 1996 1997 1998
2   case2 1997 1999 2000
3   case3 1996 1998 2000

I would now like to transform the data frame into this

  textcol `1996` `1997` `1998` `1999` `2000` 
1   case1      1      1      1      0      0
2   case2      0      1      0      1      1
3   case3      1      0      1      0      1

I tried using separate() and str_split() to no avail. Can someone point me to the right approach?


Solution

  • Use separate_rows to get each year in a separate row and then use table. (Append %>% as.data.frame.matrix to the pipeline if you want it as a data frame.)

    library(tidyr)
    
    tab <- raw %>% separate_rows(years) %>% table
    

    giving:

    tab
    ##        years
    ## textcol 1996 1997 1998 1999 2000
    ##   case1    1    1    1    0    0
    ##   case2    0    1    0    1    1
    ##   case3    1    0    1    0    1
    

    We can display this as a graph. Convert tab to an igraph, g. Then create a custom layout, lay, to display the vertices in order as the usual bipartite layout in igraph tries to reorder them to minimize crossings. Finally plot it.

    library(igraph)
    
    g <- graph_from_incidence_matrix(tab)
    lay <- with(as.data.frame(layout_as_bipartite(g)), 
      cbind(ave(V1, V2, FUN = sort), V2))
    plot(g, layout = lay, vertex.size = 2)
    

    screenshot