Search code examples
rdataframefunctiondplyrtidyverse

Reformatting a dataframe into wide format in R


I was wonder if there is a way for my DATA to be reformatted to my Desired_output below?

Specifically, for each unique study, we stick together a pair of pre and postNUMBER together, separately for t and c.

For example, for study == 2, we stick together a pair of pre and postNUMBER together going FROM:

study time   nt nc    mt    mc  sdt  sdc  outcome
2     pre    38 48  1.89  2.22 0.40 0.76  fram
2     post1  38 48  4.07  2.52 2.20 1.58  fram

TO:

study time            group  n    mpre     mpost  sdpre  sdpost outcome
2     pre-post1       t      38   1.89     4.07   0.40   2.20   fram
2     pre-post1       c      48   2.22     2.52   0.76   1.58   fram

ps. If a unique study in DATA, didn't have a pre (only consisted of postNUMBER) or postNUMBER (only consisted of pre), we simply leave it out.

DATA <- read.table(header=T, text="
study time   nt nc    mt    mc  sdt  sdc  outcome
1     pre    28 58  0.89  1.22 1.40 1.76  Conv
1     post1  28 58  5.07  3.52 3.20 2.58  Conv
1     post2  28 58  3.64  2.86 3.15 2.80  Conv
2     pre    38 48  1.89  2.22 0.40 0.76  fram
2     post1  38 48  4.07  2.52 2.20 1.58  fram
3     post1  31 18  2.07  1.52 1.20 0.58  voca
3     post2  32 18  3.07  2.32 1.12 9.85  voca
")

Desired_output <- read.table(header=T, text="
study time            group  n    mpre     mpost  sdpre  sdpost outcome
1     pre-post1       t      28   0.89     5.07   1.40   3.20   Conv
1     pre-post2       t      28   0.89     3.64   1.40   3.15   Conv
1     pre-post1       c      58   1.22     3.52   1.76   2.58   Conv
1     pre-post2       c      58   1.22     2.86   1.76   2.80   Conv
2     pre-post1       t      38   1.89     4.07   0.40   2.20   fram
2     pre-post1       c      48   2.22     2.52   0.76   1.58   fram
")

Solution

  • A tidyverse suggestion is to pivot_longer() in a first time, to have one row per study-time-group (and a column for n , m and sd). In a second time, you can left_join the "post" rows (filter(time!="pre") with "pre" rows (filter(time=="pre"); with the appropriate argument.

    I select some columns. Notice that I kept the row without pre with some NA ; they could be filtered if you want to leave them out.

    library(tidyverse)
    
    DATA <- read.table(header=T, text="
    study time   nt nc    mt    mc  sdt  sdc  outcome
    1     pre    28 58  0.89  1.22 1.40 1.76  Conv
    1     post1  28 58  5.07  3.52 3.20 2.58  Conv
    1     post2  28 58  3.64  2.86 3.15 2.80  Conv
    2     pre    38 48  1.89  2.22 0.40 0.76  fram
    2     post1  38 48  4.07  2.52 2.20 1.58  fram
    3     post1  31 18  2.07  1.52 1.20 0.58  voca
    3     post2  32 18  3.07  2.32 1.12 9.85  voca
    ")
    
    data2 = 
      DATA %>%  pivot_longer(nt:sdc, 
                             names_to = c(".value","group"),
                             names_pattern = "(n|m|sd)(t|c)") 
    
    left_join(data2 %>% filter(time !="pre"),
              data2 %>% filter(time =="pre"),
              by = c("study","group"),
              suffix= c("post","pre") ) %>% 
      select(study, time = timepost, group, npre, npost, mpre, mpost, sdpre,sdpost,
             outcome = outcomepost)
    #> # A tibble: 10 × 10
    #>    study time  group  npre npost  mpre mpost sdpre sdpost outcome
    #>    <int> <chr> <chr> <int> <int> <dbl> <dbl> <dbl>  <dbl> <chr>  
    #>  1     1 post1 t        28    28  0.89  5.07  1.4    3.2  Conv   
    #>  2     1 post1 c        58    58  1.22  3.52  1.76   2.58 Conv   
    #>  3     1 post2 t        28    28  0.89  3.64  1.4    3.15 Conv   
    #>  4     1 post2 c        58    58  1.22  2.86  1.76   2.8  Conv   
    #>  5     2 post1 t        38    38  1.89  4.07  0.4    2.2  fram   
    #>  6     2 post1 c        48    48  2.22  2.52  0.76   1.58 fram   
    #>  7     3 post1 t        NA    31 NA     2.07 NA      1.2  voca   
    #>  8     3 post1 c        NA    18 NA     1.52 NA      0.58 voca   
    #>  9     3 post2 t        NA    32 NA     3.07 NA      1.12 voca   
    #> 10     3 post2 c        NA    18 NA     2.32 NA      9.85 voca
    

    Created on 2024-05-07 with reprex v2.0.2