I have a data.frame df with 2 columns, the first six rows are shown here but it has many more blocksequences, each spanning 3 rows:
blocksequenceid description
M049-S215-S085 ECDTM-49
M049-S215-S085 ICD-215
M049-S215-S085 ICD-85
M049-S213-S044 ECDTM-49
M049-S213-S044 ICD-213
M049-S213-S044 ICD-44
I want to convert it to this format:
blocksequenceid description1 description2 description3
M049-S215-S085 ECDTM-49 ICD-215 ICD-85
M049-S213-S044 ECDTM-49 ICD-213 ICD-44
I've thought about dcast and reshape but I don't know what to do when it says ERROR: column time not found
for reshape and I'm not sure dcast is the right function to use here. This is what I've tried:
reshape(df, idvar='blocksequenceid', timevar = 'description', direction = 'wide')
reshape(df, idvar='blocksequenceid', v.names = 'description', direction = 'wide')
I'm sure this is so simple but there's just something I'm missing.
Here is the reproducible data.
t <- 'blocksequenceid description
M049-S215-S085 ECDTM-49
M049-S215-S085 ICD-215
M049-S215-S085 ICD-85'
df <- read.table(text = t, header = T)
Here is a possible solution.
library(tidyverse)
df %>%
rename(description1 = description) %>%
mutate(description = row_number()) %>%
spread(description, description1, sep = "")
# blocksequenceid description1 description2 description3
# 1 M049-S215-S085 ECDTM-49 ICD-215 ICD-85
t <- 'blocksequenceid description
M049-S215-S085 ECDTM-49
M049-S215-S085 ICD-215
M049-S215-S085 ICD-85
M049-S213-S044 ECDTM-49
M049-S213-S044 ICD-213
M049-S213-S044 ICD-44'
df <- read.table(text = t, header = T)
In the updated data, you should do group_by(blocksequenceid)
first.
library(tidyverse)
df %>%
rename(description1 = description) %>%
group_by(blocksequenceid) %>%
mutate(description = row_number()) %>%
spread(description, description1, sep = "")
# # A tibble: 2 x 4
# # Groups: blocksequenceid [2]
# blocksequenceid description1 description2 description3
# <chr> <chr> <chr> <chr>
# 1 M049-S213-S044 ECDTM-49 ICD-213 ICD-44
# 2 M049-S215-S085 ECDTM-49 ICD-215 ICD-85