Search code examples
rtidyversespread

Cannot accurately convert from long format to wide in r


I am trying to convert from long format to wide with following code.

 data_ige<-read.csv("serology.csv",header = TRUE,na.strings=0)

library(tidyverse) library(magrittr)

data_new <- data_ige %>% spread(test, value)

I have following dataset existing dataset

After running the code, it transforms the date (but not in the way i want) as seen in the next image, items highlighted in yellow show that these values are coming in multiple rows but they should be in first row instead of a new row. each patient had data either for 1 visit or 2 visit. so all test results from 1 visit, i want to see them in one row and test results from visit 2 in second row.

After transformation

This screen shot shows the desired outcome.

desired outcome


Solution

  • We need to create a sequence column as there are duplicates

    library(dplyr)
    library(tidyr)
    data_ige %>% 
       group_by(ID, date, test) %>%
       mutate(rn = row_number()) %>%
       ungroup %>%
       spread(test, value) %>%
       #or use pivot_wider as spread is getting deprecated
       #  pivot_wider(names_from = test, values_from = value) %>%
       select(-rn)
    # A tibble: 8 x 9
    #  ID     date    `1`   `3`   `4`   `5`   `6`   `7`   `8`
    #  <fct> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    #1 A      2008  0.035 NA    NA    NA    NA    NA    NA   
    #2 A      2011  2.75  NA    NA    NA    NA    NA    NA   
    #3 B      2011  9.99   3.65  0.68  0.02  0.17  0.5  NA   
    #4 C      2008  0     NA    NA    NA    NA    NA    NA   
    #5 C      2011 NA     NA    NA    NA    NA    NA     0.09
    #6 D      2008  0      0     0     0     0     0.59  0   
    #7 D      2011  0      0.49  0.2   0.08  0.16  0.5   0.13
    #8 D      2011  9.99  NA    NA    NA    NA    NA    NA   
    

    data

    data_ige <- structure(list(ID = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
    3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
    4L), .Label = c("A", "B", "C", "D"), class = "factor"), date = c(2008, 
    2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2008, 2011, 2008, 
    2008, 2008, 2008, 2008, 2008, 2008, 2011, 2011, 2011, 2011, 2011, 
    2011, 2011), test = c(1, 1, 1, 3, 4, 5, 6, 7, 8, 1, 1, 1, 3, 
    4, 5, 6, 7, 8, 1, 3, 4, 5, 6, 7, 8), value = c(0.035, 2.75, 9.99, 
    3.65, 0.68, 0.02, 0.17, 0.5, 0.09, 0, 0, 0, 0, 0, 0, 0, 0.59, 
    0, 9.99, 0.49, 0.2, 0.08, 0.16, 0.5, 0.13)), 
    class = "data.frame", row.names = c(NA, 
    -25L))