Search code examples
rsplitdplyrtidyrstrsplit

Split column of comma-separated numbers into multiple columns based on value


I have a column f in my dataframe that I would like to spread into multiple columns based on the values in that column. For example:

df <- structure(list(f = c(NA, "18,17,10", "12,8", "17,11,6", "18", 
                           "12", "12", NA, "17,11", "12")), .Names = "f", row.names = c(NA, 
                                                                                        10L), class = "data.frame")

df
#            f
#  1      <NA>
#  2  18,17,10
#  3      12,8
#  4   17,11,6
#  5        18
#  6        12
#  7        12
#  8      <NA>
#  9     17,11
#  10       12

How would I split column f into multiple columns indicating the numbers in the row. I'm interested in something like this:

          6    8    10   11   12   17   18    
1         0    0    0    0    0    0    0
2         0    0    1    0    0    1    1
3         0    1    0    0    1    0    0
4         1    0    0    1    0    1    0
5         0    0    0    0    0    0    1
6         0    0    0    0    1    0    0
7         0    0    0    0    1    0    0
8         0    0    0    0    0    0    0
9         0    0    0    1    0    1    0
10        0    0    0    0    1    0    0

I'm thinking I could useunique on the f column to create the seperate columns based on the different numbers and then do a grepl to determine if the specific number is in column f but I was wondering if there was a better way. Something similar to spread or separate in the tidyr package.


Solution

  • A solution using tidyr::separate_rows will be as:

    library(tidyverse)
    df %>% mutate(ind = row_number()) %>%
      separate_rows(f, sep=",") %>%
      mutate(f = ifelse(is.na(f),0, f)) %>%
      count(ind, f) %>%
      spread(f, n, fill = 0) %>%
      select(-2) %>% as.data.frame()
    
       # ind 10 11 12 17 18 6 8
    # 1    1  0  0  0  0  0 0 0
    # 2    2  1  0  0  1  1 0 0
    # 3    3  0  0  1  0  0 0 1
    # 4    4  0  1  0  1  0 1 0
    # 5    5  0  0  0  0  1 0 0
    # 6    6  0  0  1  0  0 0 0
    # 7    7  0  0  1  0  0 0 0
    # 8    8  0  0  0  0  0 0 0
    # 9    9  0  1  0  1  0 0 0
    # 10  10  0  0  1  0  0 0 0