Using unnest_tokens() to split a column by a specific character?

I'm working with a column of vectors of urls formatted as a string with each url separated by a comma:

column_with_urls

["url.a, url.b, url.c"]

["url.d, url.e, url.f"]

I would like to use the tidytext::unnest_tokens() R function to separate these out into one url per line (although I'm open to other preferably R based solutions). I've read the docs here but I can't tell if it's possible/advisable to enter a single character to split on.

My thought is something like unnest_tokens(url, column_with_urls, by = ','). Is there a way to specify that kind of argument and/or a better way to solve this problem?

My desired output is a dataframe with one url per row like this (and all other data for the original rows copied over to each row):

url

url.a

url.b

url.c

...

Thanks in advance.

Solution

The unnest_tokens function has an option for you to split on a regex pattern. Below is the example syntax to split on a comma using this option (you could also use it for more complex patterns).

Note that this will convert the class of your input data to a tibble

my_df = data.frame(id=1:2, urls=c("url.a, url.b, url.c",
                                  "url.d, url.e, url.f"))
tidytext::unnest_tokens(my_df, out, urls, token = 'regex', pattern=",")
# # A tibble: 6 × 2
#     id    out
#   <int>  <chr>
# 1     1  url.a
# 2     1  url.b
# 3     1  url.c
# 4     2  url.d
# 5     2  url.e
# 6     2  url.f