Search code examples
rstringr

How to collapse vectors inside a dataframe into strings


I'm creating a new variable inside my dataframe from str_extract_all, which results in a column with some vectors inside it. A similar result comes from this manipulation of iris:

t = iris %>% mutate(test = str_extract_all(Species, 's\\w')) %>% arrange(Sepal.Width)
head(t)
Sepal.Length Sepal.Width Petal.Length Petal.Width    Species   test
          5.0         2.0          3.5         1.0 versicolor     si
          6.0         2.2          4.0         1.0 versicolor     si
          6.2         2.2          4.5         1.5 versicolor     si
          6.0         2.2          5.0         1.5  virginica  character(0) 
          4.5         2.3          1.3         0.3     setosa  c("se", "sa")
          5.5         2.3          4.0         1.3 versicolor     si

I'd like to collapse the "setosa" result into "se, sa" or something similar, and have the "virginica" result as NA.

str_flatten() and paste(, collapse='') collapse the whole column into a sigle string (too long to show here).

How can I collapse only the desired vectors, or get the result directly from str_extract()?


Solution

  • Try a variant of

    iris |>
      dplyr::mutate(test = stringi::stri_extract_all_regex(Species, 's\\w', simplify = TRUE))
    

    Note, I do not know if stringi::stri_extract_all_*() has a default argument which allows to pad NA values to more than one column.