I have a column of character string with each row containing information for multiple columns, including the column names. Every cell is structured so one can easily see the different pieces of information, but it is not easy to work with in this format. A cell is formatted like 'variable: value', with multiple inputs separated by commas. I would like to split the column into separate columns with the column names based on what's written before the colon. Here's an example.
my_df <- tibble(
address = c("street: AvenueName, number: 1, code: 1234 AB, city: City One",
"street: AnotherStreet, number: 99, code: 5678 CD, city: Town Two")
)
I know I can split strings by using separate_wider_delim
and then assign the column names to the data.frame. While this works, I would like to see a method that does not depend on setting the column names 'manually', but derive them from the values inside the string.
My expected dataframe would look like this:
expected_df <- tibble(
street= c("AvenueName", "AnotherStreet"),
number = c(1, 99),
code = c("1234 AB", "5678 CD"),
city = c("City One", "Town Two")
)
Without hard-coding column names, here is one approach -
library(dplyr)
library(tidyr)
my_df %>%
#To keep track of each row of data
mutate(row = row_number()) %>%
# Bring data in separate rows splitting on comma
separate_rows(address, sep = ",\\s*") %>%
# Split data on colon to get data in two columns
separate_wider_delim(address, ": ", names = c("col", "value")) %>%
# Get data in wide format
pivot_wider(names_from = col, values_from = value) %>%
# Drop row column
select(-row)
# street number code city
# <chr> <chr> <chr> <chr>
#1 AvenueName 1 1234 AB City One
#2 AnotherStreet 99 5678 CD Town Two
data
I have taken the liberty to fix your data to match it with the expected output.
my_df <- tibble(
address = c("street: AvenueName, number: 1, code: 1234 AB, city: City One",
"street: AnotherStreet, number: 99, code: 5678 CD, city: Town Two")
)