Search code examples
rgroup-byrenamedplyrspread

reshape or dcast long to wide with no value.var 2 columns


I have a data.frame df with 2 columns, the first six rows are shown here but it has many more blocksequences, each spanning 3 rows:

blocksequenceid description
M049-S215-S085 ECDTM-49
M049-S215-S085 ICD-215 
M049-S215-S085 ICD-85
M049-S213-S044 ECDTM-49
M049-S213-S044 ICD-213
M049-S213-S044 ICD-44

I want to convert it to this format:

blocksequenceid description1 description2 description3 
M049-S215-S085 ECDTM-49 ICD-215 ICD-85
M049-S213-S044 ECDTM-49 ICD-213 ICD-44

I've thought about dcast and reshape but I don't know what to do when it says ERROR: column time not found for reshape and I'm not sure dcast is the right function to use here. This is what I've tried:

reshape(df, idvar='blocksequenceid', timevar = 'description', direction = 'wide')
reshape(df, idvar='blocksequenceid', v.names = 'description', direction = 'wide')

I'm sure this is so simple but there's just something I'm missing.


Solution

  • Here is the reproducible data.

    t <- 'blocksequenceid description
    M049-S215-S085 ECDTM-49
    M049-S215-S085 ICD-215 
    M049-S215-S085 ICD-85'
    
    df <- read.table(text = t, header = T)
    

    Here is a possible solution.

    library(tidyverse)
    df %>% 
      rename(description1 = description) %>%
      mutate(description = row_number()) %>%
      spread(description, description1, sep = "")
    
    #   blocksequenceid description1 description2 description3
    # 1  M049-S215-S085     ECDTM-49      ICD-215       ICD-85
    

    Edit for modified data

    t <- 'blocksequenceid description
    M049-S215-S085 ECDTM-49
    M049-S215-S085 ICD-215 
    M049-S215-S085 ICD-85
    M049-S213-S044 ECDTM-49
    M049-S213-S044 ICD-213
    M049-S213-S044 ICD-44'
    
    df <- read.table(text = t, header = T)
    

    In the updated data, you should do group_by(blocksequenceid) first.

    library(tidyverse)
    df %>% 
      rename(description1 = description) %>%
      group_by(blocksequenceid) %>%
      mutate(description = row_number()) %>%
      spread(description, description1, sep = "")
    
    # # A tibble: 2 x 4
    # # Groups:   blocksequenceid [2]
    #   blocksequenceid description1 description2 description3
    #   <chr>           <chr>        <chr>        <chr>       
    # 1 M049-S213-S044  ECDTM-49     ICD-213      ICD-44      
    # 2 M049-S215-S085  ECDTM-49     ICD-215      ICD-85