I'm working with a column of vectors of urls formatted as a string with each url separated by a comma:
column_with_urls
["url.a, url.b, url.c"]
["url.d, url.e, url.f"]
I would like to use the tidytext::unnest_tokens()
R function to separate these out into one url per line (although I'm open to other preferably R based solutions). I've read the docs here but I can't tell if it's possible/advisable to enter a single character to split on.
My thought is something like unnest_tokens(url, column_with_urls, by = ',')
. Is there a way to specify that kind of argument and/or a better way to solve this problem?
My desired output is a dataframe with one url per row like this (and all other data for the original rows copied over to each row):
url
url.a
url.b
url.c
...
Thanks in advance.
The unnest_tokens
function has an option for you to split on a regex pattern. Below is the example syntax to split on a comma using this option (you could also use it for more complex patterns).
Note that this will convert the class of your input data to a tibble
my_df = data.frame(id=1:2, urls=c("url.a, url.b, url.c",
"url.d, url.e, url.f"))
tidytext::unnest_tokens(my_df, out, urls, token = 'regex', pattern=",")
# # A tibble: 6 × 2
# id out
# <int> <chr>
# 1 1 url.a
# 2 1 url.b
# 3 1 url.c
# 4 2 url.d
# 5 2 url.e
# 6 2 url.f