How to generate a set of dummy variables dependent on values in several other columns with same prefix in R?

I have a data frame I am trying to put in a format I can use for my analysis that looks the following:

ID	Name	Year	K1	K2	K50
1	Contract XYZ	2000	transport	elephants
2	Agreement ABC	2003	pens	music
3	Document 123	2003	elephants
4	Empty Space	2004	music		transport

Basically, I have documents with a unique ID and the year they were signed plus a set of variables name K_1 to K_50 that contain unordered keywords. For each of the keywords I would like to generate a dummy variable (i.e. named transport, pens...) that is one if any of the K_1 to K_50 entries contains that particular string.

I have tried the code below which creates the dummy with the name transport if K1 or K2 contain the code but for 50 keywords in 50 columns it is a lot of manual work. Ideally, I would be able to use the string as the variable name and run through all 50 columns to identify all keywords and create dummies that will be one if the keyword is present in either column for a particular ID. I would, however, also be happy with manually creating dummies and being able to look at all 50 keywords without typing it all in.

document_dummies <- mutate(document,
                  transport = case_when(
                    K_1 == "transport" | K_2 == "transport" ~ 1,
                    TRUE ~ NA_real_ 
                  ))

Solution

Does this help you?

library(tidyverse)

document <- data.frame(
  stringsAsFactors = FALSE,
  ID = c(1L, 2L, 3L, 4L),
  Name = c("Contract XYZ","Agreement ABC",
           "Document 123","Empty Space"),
  Year = c(2000L, 2003L, 2003L, 2004L),
  K1 = c("transport", "pens", "elephants", "music"),
  K2 = c("elephants", "music", NA, NA),
  K50 = c(NA, NA, NA, "transport")
)
document %>%
  pivot_longer(starts_with("K")) %>%
  select(-name) %>%
  filter(! is.na(value)) %>%
  mutate(has_property = 1) %>%
  pivot_wider(names_from = value, values_from = has_property)
#> # A tibble: 4 x 7
#>      ID Name           Year transport elephants  pens music
#>   <int> <chr>         <int>     <dbl>     <dbl> <dbl> <dbl>
#> 1     1 Contract XYZ   2000         1         1    NA    NA
#> 2     2 Agreement ABC  2003        NA        NA     1     1
#> 3     3 Document 123   2003        NA         1    NA    NA
#> 4     4 Empty Space    2004         1        NA    NA     1

^{Created on 2021-09-21 by the reprex package (v2.0.1)}