Search code examples
rpivotreshapecategorical-data

Rows to Columns in R for categorical data (w generic numbered cols)?


I need to turn long data into wide data, but with new numbered columns (col1, col2, etc.) created as needed. (In other words, the col names are not taken from keys, but generated generically.) So from something like this:

d <- data.frame(row_id = c(1,1,2,2,2,3,3,3),cat = c("a","b","a","d","e","d","f","g"))

  row_id cat
1      1   a
2      1   b
3      2   a
4      2   d
5      2   e
6      3   d
7      3   f
8      3   g

To something like this:

row_id,cat1,cat2,cat3
1,a,b,,
2,a,d,e
3,d,f,g

spread() and pivot_wider() don't really fit the bill, since I'm working with categorical values and a large number of categories (but no more than 8 per id). In python, I might iterate through the rows, appending new key:value pairs ("cat"+i:"a")to a dictionary within a list (or something), but not sure what the analogous method would be in R or if there's a better approach altogether.


Solution

  • We create a sequence column by 'row_id' and then pivot to wide with pivot_wider

    library(dplyr)
    library(tidyr)
    library(stringr)
    library(data.table)
    d %>% 
      mutate(rn = str_c('cat', rowid(row_id))) %>%
      pivot_wider(names_from = rn, values_from = cat)
    

    -output

    # A tibble: 3 x 4
    #  row_id cat1  cat2  cat3 
    #   <dbl> <chr> <chr> <chr>
    #1      1 a     b     <NA> 
    #2      2 a     d     e    
    #3      3 d     f     g