Search code examples
rtidyrlong-format-datawide-format-data

Transform table to wide format where not all columns are in long


I'm trying to transform a complex dataframe such as (:

olddata_long =

A  B   C  SAMPLE_ID  1   2   3   4   5
X1 Y1  Z1 SAMPLE1   G1  H1  J1  K1  L1 
X1 Y1  Z1 SAMPLE2   G2  H2  J2  K2  L2 
X2 Y2  Z2 SAMPLE1   G3  H3  J3  K3  L3 
X2 Y2  Z2 SAMPLE2   G4  H4  J4  K4  L4 

Where 1-5 are unique to each sample. I'm trying to transform this to a "wide" format, such as:

A   B   C    SAMPLE1                      SAMPLE2
X1  Y1  Z1   1=G1;2=H1;3=J1;4=K1;5=L1     1=G2;2=H2;3=J2;4=K2;5=L2
X2  Y2  Z2   1=G3;2=H3;3=J3;4=K3;5=L3     1=G4;2=H4;3=J4;4=K4;5=L4

I've messed with transforming to long formats, but can't think of anyway of doing this for multiple variables.

data_wide <- spread(olddata_long, SAMPLE_ID, c(1,2,3,4,5))

Maybe it is easier to concatenate 1:5 then transform to wide format?

Edit: Thank you for your responses!

I tested all three version and all of them worked fine. I also microbenchmarked them in a real dataset which is way bigger (13mb txt file)

          min      lq       mean     median   uq       max         neval
MrFlick   161.9813 229.5798 256.5784 240.3421 250.6077 511.7215    20
stefan    108.9270 120.2335 160.4594 174.6441 185.0129 196.7983    20
TarJae    149.7109 163.1959 226.3037 207.8249 238.0539 420.1584    20

This is just to have an idea. I will give the correct answer to the one that was faster (to respond)


Solution

  • You can first make it longer, then do the transformation, then make it wider. For example

    library(dplyr, warn=FALSE)
    library(tidyr)
    
    
    dd %>% 
      pivot_longer(`1`:`5`) %>% 
      mutate(val=paste0(name, "=", value), name=NULL, value=NULL) %>%
      pivot_wider(names_from=SAMPLE_ID, values_from=val, values_fn=function(x) paste(x, collapse=";"))
    #   A     B     C     SAMPLE1                  SAMPLE2                 
    #   <chr> <chr> <chr> <chr>                    <chr>                   
    # 1 X1    Y1    Z1    1=G1;2=H1;3=J1;4=K1;5=L1 1=G2;2=H2;3=J2;4=K2;5=L2
    # 2 X2    Y2    Z2    1=G3;2=H3;3=J3;4=K3;5=L3 1=G4;2=H4;3=J4;4=K4;5=L4
    

    Tested with

    dd <- read.table(text="A  B   C  SAMPLE_ID  1   2   3   4   5
    X1 Y1  Z1 SAMPLE1   G1  H1  J1  K1  L1 
    X1 Y1  Z1 SAMPLE2   G2  H2  J2  K2  L2 
    X2 Y2  Z2 SAMPLE1   G3  H3  J3  K3  L3 
    X2 Y2  Z2 SAMPLE2   G4  H4  J4  K4  L4", header=T, check.names=F)