Search code examples
rtidyverse

Splitting an ID column based on ID + Phase of sample


data =

data.frame ( ID = c('101 1h','101 2h','101 3h','102 1h','102 2h','102 3h'), marker1 = c(1,1,2,2,2,3), marker2 = c(3,3,4,5,5,6)))

I would like to split the data into rows with just ID number and then three columns for each phase with its respective marker1 and marker2 result.

For example the first row should be ID = 101, phase = 1h, biomarker1 = 1,biomarker2 = 3

I have tried first separate() the ID column into ID number and phase. Then using the new phase column in a spread() command. The spread command however for each correct answer in for example column 1h, i would have two NA's in columns 2h and 3h. So if someone knows how to solve this i would love to hear.

Thank you.

Update: I think i realized my mistake, i was probably supposed to use gather() instead of spread().


Solution

  • This looks fairly straightforward with a combination of separate and pivot_wider from tidyr

    library(tidyr)
    
    separate(data, ID, c('ID', 'phase')) %>% 
      pivot_wider(names_from = 'phase', values_from = c('marker1', 'marker2'))
    #> # A tibble: 2 x 7
    #>   ID    marker1_1h marker1_2h marker1_3h marker2_1h marker2_2h marker2_3h
    #>   <chr>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
    #> 1 101            1          1          2          3          3          4
    #> 2 102            2          2          3          5          5          6