I have a dataset with a complex column results
as (character
class).
Data sample is here.
date_agenda id_question id_event results
2020-09-04 202009043 7426 "2:1:3|3:1:3|4:8:4|5:1:4|6:7:0|7:7:4|8:7:3|9:3:0|10:8:4|11:1:0|12:8:4|13:8:4|14:8:1|15:1:3|16:1:0|17:1:3|18:1:3|"
2020-09-04 202009044 7427 "2:1:3|3:1:3|4:8:4|5:1:4|6:7:0|7:7:4|8:7:3|9:3:0|10:8:4|11:1:0|12:8:4|13:8:4|14:8:3|15:1:3|16:1:0|17:1:3|18:1:3|"
2020-09-04 202009046 7428 "2:1:1|3:1:1|4:8:4|5:1:4|6:7:0|7:7:4|8:7:1|9:3:0|10:8:4|11:1:0|12:8:4|13:8:4|14:8:3|15:1:1|16:1:0|17:1:1|18:1:1|"
I would like to convert results
and the whole dataframe from wide to long format, splitting the last column results
into three numeric columns.
For example, 2:1:3
is one event in results
and there hundreds of such events in this column.
My desired output:
date_agenda id_question id_event mp_id mp_fraction result
2020-09-04 202009043 7426 2 1 3
2020-09-04 202009043 7426 3 1 3
2020-09-04 202009043 7426 4 8 4
2020-09-04 202009043 7426 5 1 4
2020-09-04 202009043 7426 6 7 0
Basically, one row in the initial wide format (4 columns) will be converted into hundreds of rows in the long format (6 columns).
I understand that it should be done with pivot_longer
, pivot_wider
(or gather
and spread
) and maybe melt
, but not sure that I have the logic now. Whether I should split the character first into hundreds of columns and then make it wide or something else.
Appreciate tips! Thanks!
We can use separate_rows
to split results
on "|"
into different rows and separate
to split on ":"
in three different columns.
library(dplyr)
library(tidyr)
df %>%
separate_rows(results, sep = '\\|') %>%
#To remove empty values generated from last "|"
filter(results != '') %>%
separate(results,c("mp_id", "mp_fraction", "result"), sep = ":",convert = TRUE)
# A tibble: 51 x 6
# date_agenda id_question id_event mp_id mp_fraction result
# <chr> <int> <int> <int> <int> <int>
# 1 2020-09-04 202009043 7426 2 1 3
# 2 2020-09-04 202009043 7426 3 1 3
# 3 2020-09-04 202009043 7426 4 8 4
# 4 2020-09-04 202009043 7426 5 1 4
# 5 2020-09-04 202009043 7426 6 7 0
# 6 2020-09-04 202009043 7426 7 7 4
# 7 2020-09-04 202009043 7426 8 7 3
# 8 2020-09-04 202009043 7426 9 3 0
# 9 2020-09-04 202009043 7426 10 8 4
#10 2020-09-04 202009043 7426 11 1 0
# … with 41 more rows