I have been trying to reshape some data from long into the wide format. I am interested in having one row per unique ID. To mimic my requirements, I have created a sample input and the desired output as below:
id date size category name type
124 3.1 1 fruit apple royalGala
327 1.1 0 veg chilli green
124 2.1 2 fruit apple green
124 1.2 1 fruit apple jazz
124 2.2 2 fruit apple eve
124 2.1 3 fruit apple pinkLady
327 1.2 1 veg chilli red
327 1.2 2 veg chilli Jalapeño
327 1.2 3 veg chilli bananaPepper
327 3.3 1 veg chilli fresnoPepper
id fruit_apple_royalGala_date fruit_apple_royalGala_size fruit_apple_green_date fruit_apple_green_size fruit_apple_jazz_date fruit_apple_jazz_size fruit_apple_eve_date fruit_apple_eve_size fruit_apple_pinkLady_date fruit_apple_pinkLady_size veg_chilli_green_date veg_chilli_green_size veg_chilli_red_date veg_chilli_red_size veg_chilli_Jalapeño_date veg_chilli_Jalapeño_size veg_chilli_bananaPepper_date veg_chilli_bananaPepper_size veg_chilli_fresnoPepper_date veg_chilli_fresnoPepper_size
124 3.1 1 2.1 2 1.2 1 2.2 2 2.1 3 NA NA NA NA NA NA NA NA NA NA
327 NA NA NA NA NA NA NA NA NA NA 1.1 0 1.2 1 1.2 2 1.2 3 3.3 1
I am unsure how to achieve the desired output. I have looked at some relevant questions at StackOverflow but none of them help me solve this problem e.g., Convert data from long format to wide format with multiple measure columns, From long to wide data with multiple columns and Reshape multiple value columns to wide format.
I have been working on this problem since yesterday but having very little experience with gather and spread etc., I have not been able to solve it. I would greatly appreciate any assistance on this.
Thank you!
For convenience, I am also copying dput()
structure(list(
id = c(124L, 327L, 124L, 124L, 124L, 124L, 327L, 327L, 327L, 327L),
date = c(3.1, 1.1, 2.1, 1.2, 2.2, 2.1, 1.2, 1.2, 1.2, 3.3),
size = c(1L, 0L, 2L, 1L, 2L, 3L, 1L, 2L, 3L, 1L),
category = c("fruit", "veg", "fruit", "fruit", "fruit", "fruit", "veg", "veg", "veg", "veg"),
name = c("apple", "chilli", "apple", "apple", "apple", "apple", "chilli", "chilli", "chilli", "chilli"),
type = c("royalGala", "green", "green", "jazz", "eve", "pinkLady", "red", "Jalapeño", "bananaPepper", "fresnoPepper")),
class = "data.frame", row.names = c(NA, -10L))
I have a work around to solve this problem but my solution fails when I run it on my original data.
# Read the csv file
df = read.csv("C:/Desktop/test.csv")
# Unite multiple columns in to one
df_unite = df %>%
unite("info", category:type, remove = TRUE)
# Conversion from long into wide format
setDT(df_unite) # coerce to data.table
df_wide <- dcast(df_unite, id ~ info,
value.var = c("date", "size"))
df %>%
pivot_wider(id, c(category, name, type),
values_from = c(date, size),
names_glue = '{category}_{name}_{type}_{.value}')