Search code examples
rdataframetime-seriestidyversesequential

How to create wide long data set with time series and frequency data?


Given time series and frequency data like a dat1, which contained event_id and frequency of each event times.

To convert it sequential wide long data such as dat2, What is the most elegant way by R?

dat1
id  event_no    event_id    times
P001    1   A   3
P001    2   B   1
P001    3   C   2
P001    4   D   5
P002    1   A   5
P002    2   B   3
P002    3   C   1
P002    4   D   1
P002    5   E   1
dat2
id  t1  t2  t3  t4  t5  t6  t7  t8  t9  t10 t11
P001    A   A   A   B   C   C   D   D   D   D   D
P002    A   A   A   A   A   B   B   B   C   D   E

Thanks


Solution

  • Using dplyr and tidyr, we can first repeat rows using uncount, then create a unique row after grouping by id and use pivot_wider to convert data into wide format.

    library(dplyr)
    library(tidyr)
    
    df %>%
      uncount(times) %>%
      group_by(id) %>%
      mutate(event_no = paste0("t", row_number())) %>%
      pivot_wider(names_from = event_no, values_from = event_id)
      #Use spread in older version of tidyr
      #spread(event_no, event_id) 
    
    #  id    t1    t2    t3    t4    t5    t6    t7    t8    t9    t10   t11  
    #  <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct>
    #1 P001    A     A     A     B     C     C     D     D     D     D     D    
    #2 P002    A     A     A     A     A     B     B     B     C     D     E    
    

    data

    df <- structure(list(id = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
    2L), .Label = c("P001", "P002"), class = "factor"), event_no = c(1L, 
    2L, 3L, 4L, 1L, 2L, 3L, 4L, 5L), event_id = structure(c(1L, 2L, 
    3L, 4L, 1L, 2L, 3L, 4L, 5L), .Label = c("A", "B", "C", "D", "E"
    ), class = "factor"), times = c(3L, 1L, 2L, 5L, 5L, 3L, 1L, 1L, 
    1L)), class = "data.frame", row.names = c(NA, -9L))