Search code examples
rstringr

How can I count and return which strings in a list have been detected using str_detect?


I have a dataframe df which contains foods and their corresponding ingredients (df is pasted at the end).

I am interested in which foods contain "Flour", "Water" or "Salt".

Using str_detect you can determine whether the food contains one or more of these ingredients:

library(tidyverse)

strings_to_check <- c("Water", "Salt", "Flour")

df2 <- df %>%
  mutate(Key_Ingredient = str_detect(Ingredients, paste(strings_to_check, collapse = "|")))

How can I go one step further and obtain the count of key ingredients used and return which of the key ingredients were used? Put another way, how do I count how many strings in the list of strings were detected and return the ones that were detected in a separate column of values?

The expected output is:

Food Ingredients Key_Ingredient Key_Count Key_Used
Appleberry Muffins Flour, Vanilla Extract, Olive Oil, Milk, Garlic, Carrots, Chicken TRUE 2 Flour, Salt
Blue Moon Pancakes Baking Powder, Garlic, Eggs, Ice, Sugar, Tofu, Rice FALSE 0 NA
Crystalized Starfruit Milk, Beef, Tofu, Rice, Salt, Garlic, Mushrooms TRUE 1 Salt
Dragonfruit Delight Rice, Milk, Pork, Yeast, Carrots, Tofu, Mushrooms FALSE 0 NA
Ethereal Eclairs Pasta, Flour, Water, Mushrooms, Chicken, Vanilla Extract, Yeast TRUE 2 Flour, Water
Flaming Firefruit Pepper, Yeast, Vanilla Extract, Sugar, Wheat, Olive Oil, Pork FALSE 0 NA
Glowing Grapes Garlic, Nutmeg, Beef, Salt, Tofu, Onions, Baking Powder TRUE 1 Salt
Honeydew Haze Salt, Water, Rice, Yeast, Flour, Honey, Mushrooms TRUE 2 Water, Salt
Iridescent Ice Cream Water, Salt, Onions, Pasta, Spinach, Pork, Carrots TRUE 2 Water, Salt
Jellybean Jamboree Salt, Eggs, Flour, Baking Powder, Water, Potatoes, Yeast TRUE 2 Water, Salt
Kiwi Kaleidoscope Water, Honey, Salt, Potatoes, Vanilla Extract, Pork, Pasta TRUE 1 Water
Lunar Lemons Salt, Tofu, Olive Oil, Baking Powder, Pork, Vanilla Extract, Cinnamon TRUE 1 Salt
Mystic Marshmallows Salt, Flour, Onions, Water, Chicken, Eggs, Milk TRUE 2 Flour, Water
Nebula Noodles Honey, Flour, Pork, Beef, Potatoes, Spinach, Chicken TRUE 1 Flour
Omega Oranges Mushrooms, Water, Salt, Olive Oil, Spinach, Tofu, Potatoes TRUE 2 Water, Salt
Phantom Peaches Wheat, Carrots, Baking Powder, Tofu, Eggs, Nutmeg, Potatoes FALSE 0 NA
Quasar Quince Honey, Tomatoes, Vanilla Extract, Flour, Garlic, Butter, Salt TRUE 2 Flour, Salt
Radiant Raspberries Salt, Yeast, Garlic, Rice, Sugar, Spinach, Baking Powder TRUE 1 Salt
Stellar Strawberries Flour, Onions, Spinach, Pork, Yeast, Water, Potatoes TRUE 2 Flour, Water
Twilight Tangerines Potatoes, Eggs, Kale, Beef, Spinach, Vanilla Extract, Milk FALSE 0 NA
Universal Ugli Fruit Cinnamon, Yeast, Potatoes, Flour, Salt, Water, Garlic TRUE 2 Water, Salt
Vortex Veggies Milk, Salt, Flour, Olive Oil, Garlic, Water, Spinach TRUE 2 Water, Salt
Whirlwind Walnuts Salt, Flour, Beef, Garlic, Milk, Potatoes, Olive Oil TRUE 2 Water, Salt
Xenon Xacuti Water, Salt, Yeast, Rice, Garlic, Vanilla Extract, Eggs TRUE 2 Water, Salt
Yellow Yams of Yore Vanilla Extract, Garlic, Chestnuts, Baking Powder, Tofu, Carrots, Sugar FALSE 0 NA
Zephyr Zucchini Pork, Honey, Baking Powder, Onions, Sugar, Yeast, Water TRUE 2 Water, Salt

The full data for df is:

df <- data.frame(
  Food = c("Appleberry Muffins", "Blue Moon Pancakes", "Crystalized Starfruit", 
           "Dragonfruit Delight", "Ethereal Eclairs", "Flaming Firefruit", 
           "Glowing Grapes", "Honeydew Haze", "Iridescent Ice Cream", 
           "Jellybean Jamboree", "Kiwi Kaleidoscope", "Lunar Lemons", 
           "Mystic Marshmallows", "Nebula Noodles", "Omega Oranges", 
           "Phantom Peaches", "Quasar Quince", "Radiant Raspberries", 
           "Stellar Strawberries", "Twilight Tangerines", "Universal Ugli Fruit", 
           "Vortex Veggies", "Whirlwind Walnuts", "Xenon Xacuti", 
           "Yellow Yams of Yore", "Zephyr Zucchini"),
  Ingredients = c("Flour, Vanilla Extract, Olive Oil, Milk, Garlic, Carrots, Chicken", 
                  "Baking Powder, Garlic, Eggs, Ice, Sugar, Tofu, Rice", 
                  "Milk, Beef, Tofu, Rice, Salt, Garlic, Mushrooms", 
                  "Rice, Milk, Pork, Yeast, Carrots, Tofu, Mushrooms", 
                  "Pasta, Flour, Water, Mushrooms, Chicken, Vanilla Extract, Yeast", 
                  "Pepper, Yeast, Vanilla Extract, Sugar, Wheat, Olive Oil, Pork", 
                  "Garlic, Nutmeg, Beef, Salt, Tofu, Onions, Baking Powder", 
                  "Salt, Water, Rice, Yeast, Flour, Honey, Mushrooms", 
                  "Water, Salt, Onions, Pasta, Spinach, Pork, Carrots", 
                  "Salt, Eggs, Flour, Baking Powder, Water, Potatoes, Yeast", 
                  "Water, Honey, Salt, Potatoes, Vanilla Extract, Pork, Pasta", 
                  "Salt, Tofu, Olive Oil, Baking Powder, Pork, Vanilla Extract, Cinnamon", 
                  "Salt, Flour, Onions, Water, Chicken, Eggs, Milk", 
                  "Honey, Flour, Pork, Beef, Potatoes, Spinach, Chicken", 
                  "Mushrooms, Water, Salt, Olive Oil, Spinach, Tofu, Potatoes", 
                  "Wheat, Carrots, Baking Powder, Tofu, Eggs, Nutmeg, Potatoes", 
                  "Honey, Tomatoes, Vanilla Extract, Flour, Garlic, Butter, Salt", 
                  "Salt, Yeast, Garlic, Rice, Sugar, Spinach, Baking Powder", 
                  "Flour, Onions, Spinach, Pork, Yeast, Water, Potatoes", 
                  "Potatoes, Eggs, Kale, Beef, Spinach, Vanilla Extract, Milk", 
                  "Cinnamon, Yeast, Potatoes, Flour, Salt, Water, Garlic", 
                  "Milk, Salt, Flour, Olive Oil, Garlic, Water, Spinach", 
                  "Salt, Flour, Beef, Garlic, Milk, Potatoes, Olive Oil", 
                  "Water, Salt, Yeast, Rice, Garlic, Vanilla Extract, Eggs", 
                  "Vanilla Extract, Garlic, Chestnuts, Baking Powder, Tofu, Carrots, Sugar", 
                  "Pork, Honey, Baking Powder, Onions, Sugar, Yeast, Water")
)

Solution

  • You can use str_count and str_match_all functions.

    df2 <- df %>%
        mutate(Key_Ingredient = str_detect(Ingredients, paste(strings_to_check, collapse = "|"))) %>% 
        mutate(Key_Count=str_count(Ingredients,paste(strings_to_check,collapse="|"))) %>% 
        mutate(Key_Used=str_match_all(Ingredients,paste(strings_to_check,collapse="|")))