Search code examples
rpivotapplyspread

How to I split a date column and a result column into new columns for multiple observations per ID?


I would like to spread the date column into new columns (Date 1, Date 2, Date 3, etc.) for each observation in chronological order. I would also like to add a column with the corresponding result My data looks like this:

> CV
       Name Accession   MRN  Collected   Result
1 Doe, John       123 55555 2022-01-05 Detected
2 Doe, John       234 55555 2022-01-06 Negative
3 Doe, John       345 55555 2022-01-07 Detected
4 Doe, Jane       456 66666 2022-01-08 Negative
5 Doe, Jane       567 66666 2022-01-09 Negative
6 Doe, Jane       678 66666 2022-01-20 Negative

I would like it to look like this

Name           MRN        Date_1      Result_1      Date_2      Result_2     Date_3    Result_3        

Doe, John     55555     2022-01-05    Detected     2022-01-06   Negative   2022-01-07  Detected
Doe, Jane     66666     2022-01-08    Negative     2022-01-09   Negative   2022-01-20  Negative     

I think I may need to use the accession number as a key but I'm not really sure how to perform this?

data

Name <- c("Doe, John", "Doe, John", "Doe, John", "Doe, Jane", 
          "Doe, Jane", "Doe, Jane")
Accession <- c(123, 234, 345, 456, 567, 678)
MRN <- c(55555, 55555, 55555, 66666, 66666, 66666)
Collected <- c("2022-01-05", "2022-01-06", "2022-01-07", "2022-01-08", 
              "2022-01-09", "2022-01-20")
Result <- c("Detected", "Negative", "Detected", "Negative", 
            "Negative", "Negative")
CV <- data.frame(Name, Accession, MRN, Collected, Result)

Solution

  • You could replace the Accession by a running id (1, 2, 3, ...) and use reshape.

    > CV |> 
    +   transform(Accession=ave(Name, Name, FUN=seq_along)) |> 
    +   reshape(idvar=c('Name', 'MRN'), timevar='Accession', direction='wide')
           Name   MRN Collected.1 Result.1 Collected.2 Result.2 Collected.3 Result.3
    1 Doe, John 55555  2022-01-05 Detected  2022-01-06 Negative  2022-01-07 Detected
    4 Doe, Jane 66666  2022-01-08 Negative  2022-01-09 Negative  2022-01-20 Negative
    

    If you want the column to be named Date instead of Collected add a line that does the cosmetics.

    > CV |> 
    +   transform(Accession=ave(Name, Name, FUN=seq_along)) |> 
    +   reshape(idvar=c('Name', 'MRN'), timevar='Accession', direction='wide') |> 
    +   {\(.) setNames(., sub('Collected', 'Date', names(.)))}()
           Name   MRN     Date.1 Result.1     Date.2 Result.2     Date.3 Result.3
    1 Doe, John 55555 2022-01-05 Detected 2022-01-06 Negative 2022-01-07 Detected
    4 Doe, Jane 66666 2022-01-08 Negative 2022-01-09 Negative 2022-01-20 Negative
    

    Data:

    > dput(CV)
    structure(list(Name = c("Doe, John", "Doe, John", "Doe, John", 
    "Doe, Jane", "Doe, Jane", "Doe, Jane"), Accession = c(123, 234, 
    345, 456, 567, 678), MRN = c(55555, 55555, 55555, 66666, 66666, 
    66666), Collected = c("2022-01-05", "2022-01-06", "2022-01-07", 
    "2022-01-08", "2022-01-09", "2022-01-20"), Result = c("Detected", 
    "Negative", "Detected", "Negative", "Negative", "Negative")), class = "data.frame", row.names = c(NA, 
    -6L))