Search code examples
rdataframelistfunctiontidyverse

Long to wide format based on variable suffixes in tidyverse in R


I wonder if there is a way for my DATA to be reformatted to my Desired_output below?

Specifically, for each unique study, we stick together a pair of pre and postNUMBER together, separately for T and C.

ps. If a unique study in DATA, didn't have a pre (only consisted of postNUMBER) or postNUMBER (only consisted of pre), we simply leave it out (in this case, Jasmi and Zill studies are left out).

I tried the following without success:

DATA <- read.table(header=TRUE,text="
study    year    mT   sdT    nT    mC   sdC    nC  time
Al_Ajmi  2015  1.68  1.07    25  1.44  1.08    31  pre
Al_Ajmi  2015  7.4   2.22    25  1.08  1.12    31  post1
Al_Ajmi  2015  8.08  1.75    25  1.48  1.08    31  post2
Jasmi    2020  1.3   1       30  2     1       30  pre
Zill     2019  2     1       41  3     2       32  post1
Zill     2019  3     3       41  1     1       32  post2")


Desired_output <- read.table(header=T, text="
study    year    time         group  n    mpre     mpost  sdpre  sdpost
Al_Ajmi  2015    pre-post1    T      25   1.68     7.4    1.07   2.22
Al_Ajmi  2015    pre-post2    T      25   1.68     8.08   1.07   1.75
Al_Ajmi  2015    pre-post1    C      31   1.44     1.08   1.08   1.12
Al_Ajmi  2015    pre-post2    C      31   1.44     1.48   1.08   1.08
")

# I tried the following without success:
library(tidyverse)
dat8 <- DATA %>% pivot_longer(nT:sdC, 
                        names_to = c(".value","group"),
                        names_pattern = "(n|m|sd)(T|C)") 

output <- 
  left_join(filter(dat8, time !="pre"),
            filter(dat8, time =="pre"),
            by = c("study","group"),
            suffix= c("post","pre") ) %>% 
  select(study:sdpost,npre:sdpre) %>% 
  rename_with(~str_remove(.,"post"), -c("mpost","sdpost")) %>% 
  mutate(time=recode(time, "post1"="pre-post1", "post2"="pre-post2")) %>% 
  arrange(study, group)

Solution

  • You are not pivotting all the columns:

    dat8 <- DATA %>% pivot_longer(
      mT:nC, # change here
      names_to = c(".value", "group"),
      names_pattern = "(m|n|sd)(T|C)"
    )
    

    Everything else seems correct?