Search code examples
rdplyr

Repeated event long format data to single multiple variable


I have a repeated observations in long format

ID      Approach Date
-42365 Sternotomy 18-11-2022
-42365 Thoracotomy 22-03-2024
-11234 Thoracotomy 12-03-2018
-11234 Sternotomy 17-05-2023

I have to convert it to this using dplyr or any other packge:

ID     Approach_1    Date_1   Approach_2 Date_2
-42365 Sternotomy  18-11-2022 Thoracotomy 22-03-2024
-11234 Thoracotomy 12-03-2018 Sternotomy 17-05-2023


Solution

  • The trick is to create a unique identifier since the ID alone isn't one, then play with pivot_*.

    df1 <- read.table(text = "ID      Approach Date
    -42365 Sternotomy 18-11-2022
    -42365 Thoracotomy 22-03-2024
    -11234 Thoracotomy 12-03-2018
    -11234 Sternotomy 17-05-2023", header = TRUE)
    
    library(dplyr)
    #> 
    #> Attaching package: 'dplyr'
    #> The following objects are masked from 'package:stats':
    #> 
    #>     filter, lag
    #> The following objects are masked from 'package:base':
    #> 
    #>     intersect, setdiff, setequal, union
    library(tidyr)
    
    df1 %>%
      mutate(cnt = seq_along(Approach), .by = ID) %>%
      pivot_longer(-c(ID, cnt)) %>%
      pivot_wider(
        id_cols = ID,
        names_from = c(name, cnt),
        values_from = value
      )
    #> # A tibble: 2 × 5
    #>       ID Approach_1  Date_1     Approach_2  Date_2    
    #>    <int> <chr>       <chr>      <chr>       <chr>     
    #> 1 -42365 Sternotomy  18-11-2022 Thoracotomy 22-03-2024
    #> 2 -11234 Thoracotomy 12-03-2018 Sternotomy  17-05-2023
    

    Created on 2024-10-14 with reprex v2.1.1