Indicate in df columns occurrence of keywords of another column in R

Assuming a data frame df with several columns, of which a "description" field,

and assuming a set of keywords stored within a separate vector keywords, what is the best practice to:

create columns in df for each of these keywords, named accordingly ;
and storing the count of its own number of occurrences within the "description" field of df?

e.g.

(df <- data.frame(
  ID = letters[1:10],
  DESCRIPTION = c("blue", "red", "this was green", "this was red", "blue and red", "green", NA, "green", "green, blue, and red", NA)
))
   ID          DESCRIPTION
1   a                 blue
2   b                  red
3   c       this was green
4   d         this was red
5   e         blue and red
6   f                green
7   g                 <NA>
8   h                green
9   i green, blue, and red
10  j                 <NA>
keywords <- c("blue", "red", "green")

would return

   ID          DESCRIPTION blue red green
1   a                 blue    1   0     0
2   b                  red    0   1     0
3   c       this was green    0   0     1
4   d         this was red    0   1     0
5   e         blue and red    1   1     0
6   f                green    0   0     1
7   g                 <NA>    0   0     0
8   h                green    0   0     1
9   i green, blue, and red    1   1     1
10  j                 <NA>    0   0     0

Prefereable using base R or dplyr (eg. avoiding data.table).

Note: the answer needs to be scalable (many possible keywords).

Solution

An approach using unnest, str_count and pivot_wider, assuming all occurrences of the keywords within all strings per row have to be counted. Using a slightly modified data set to show multiple counts

library(dplyr)
library(tidyr)
library(stringr)

df %>% 
  mutate(nms = list(!!keywords)) %>% 
  unnest(nms) %>% 
  rowwise() %>% 
  mutate(values = str_count(DESCRIPTION, nms), 
         values = replace(values, is.na(values), 0)) %>% 
  ungroup() %>% 
  pivot_wider(names_from=nms, values_from=values)
# A tibble: 10 × 5
   ID    DESCRIPTION           blue   red green
   <chr> <chr>                <dbl> <dbl> <dbl>
 1 a     blue blue                2     0     0
 2 b     red                      0     1     0
 3 c     this was green           0     0     1
 4 d     this was red             0     1     0
 5 e     blue and red             1     1     0
 6 f     green                    0     0     1
 7 g     NA                       0     0     0
 8 h     green                    0     0     1
 9 i     green, blue, and red     1     1     1
10 j     NA                       0     0     0

or with base R sapply and str_count

library(stringr)

cbind(df, sapply(keywords, function(x){
  res <- str_count(df$DESCRIPTION, x)
  replace(res, is.na(res), 0)}))
   ID          DESCRIPTION blue red green
1   a            blue blue    2   0     0
2   b                  red    0   1     0
3   c       this was green    0   0     1
4   d         this was red    0   1     0
5   e         blue and red    1   1     0
6   f                green    0   0     1
7   g                 <NA>    0   0     0
8   h                green    0   0     1
9   i green, blue, and red    1   1     1
10  j                 <NA>    0   0     0

mod. data

df <- structure(list(ID = c("a", "b", "c", "d", "e", "f", "g", "h", 
"i", "j"), DESCRIPTION = c("blue blue", "red", "this was green", 
"this was red", "blue and red", "green", NA, "green", "green, blue, and red", 
NA)), row.names = c(NA, -10L), class = "data.frame")