Search code examples
rdataframedplyrdata-manipulation

Separate positive values into multiple rows on multiple columns


Suppose I have a data set like this:

dat <- tibble(id = 1:4, 
              col1 = c(0, 1, 1, 0),
              col2 = c(1, 0, 1, 0),
              col3 = c(1, 1, 0, 1))

> dat
# A tibble: 4 × 4
     id  col1  col2  col3
  <int> <dbl> <dbl> <dbl>
1     1     0     1     1
2     2     1     0     1
3     3     1     1     0
4     4     0     0     1

I'd like to separate, for every unique id, the multiple 1s into multiple rows, i.e. the expected output is:

# A tibble: 7 × 4
     id  col1  col2  col3
  <dbl> <dbl> <dbl> <dbl>
1     1     0     1     0
2     1     0     0     1
3     2     1     0     0
4     2     0     0     1
5     3     1     0     0
6     3     0     1     0
7     4     0     0     1

For the first id (id = 1), col2 and col3 are both 1, so I would like a separate row for each of them. It kinda is like one-hot encoding for rows.


Solution

  • With help from Ritchie Sacramento and RobertoT

    library(tidyverse)
    
    dat <- tibble(id = 1:4, 
                  col1 = c(0, 1, 1, 0),
                  col2 = c(1, 0, 1, 0),
                  col3 = c(1, 1, 0, 1))
    
    dat %>%  
      pivot_longer(-id) %>% 
      filter(value != 0) %>% 
      mutate(rows = 1:nrow(.)) %>% 
      pivot_wider(values_fill = 0, 
                  names_sort = TRUE) %>% 
      select(-rows)
    
    # A tibble: 7 × 4
         id  col1  col2  col3
      <int> <dbl> <dbl> <dbl>
    1     1     0     1     0
    2     1     0     0     1
    3     2     1     0     0
    4     2     0     0     1
    5     3     1     0     0
    6     3     0     1     0
    7     4     0     0     1