Search code examples
rif-statementstring-matchingrecode

Using mutate, if_else ,and str_detect to create a new variable


I have a data frame which consists of parent companies and brands. I would like to clean and recode the brands based on multiple vectors I have created for each brand that contain product models.

#Example dataframe

companies <- c("comp1","comp2","comp3", "comp4")
brands <- c("brand1", "brand2", "brand3", "brand4")

companies_brands <- cbind(companies, brands)
companies_brands <- data.frame(companies_brands)

#output
#Rows: 4
#Columns: 2
#$ companies <chr> "comp1", "comp2", "comp3", "comp4"
#$ brands    <chr> "brand1", "brand2", "brand3", "brand4"

My dataset did not include the product model information, so I have created a product_model vector for each brand myself. See example below.

#Example product_model vectors

brand1_prod_mod <- c("b1prodmod1", "b1prodmod2", "b1prodmod3")
brand2_prod_mod <- c("b2prodmod1", "b2prodmod2", "b2prodmod3")
brand3_prod_mod <- c("b3prodmod1", "b3prodmod2", "b3prodmod3")
brand4_prod_mod <- c("b4prodmod1", "b4prodmod2", "b4prodmod3")

Some brands are incorrectly coded as product models so I would like to use something like the code below to recode/clean the brands variable. The code below runs but it only recodes some of the brands correctly. I know because I compare the original frequencies of brands to brand_r. I have tried to ensure that all strings match by trying various methods like str_replace_all() and tolower(), but it still isn't recoding fully. What's confusing is when I simply run setdiff() to isolate the difference between companies_brands$brand_r and each individual product_model vector, it properly accounts for all of the matching strings, which confirms that there is no format/space/case difference to fix.

companies_brands_r <- companies_brands %>% mutate(brand_r = 
                                  if_else(str_detect(brands, brand1_prod_mod), "brand1_R",
                                    if_else(str_detect(brands, brand2_prod_mod), "brand2_R",
                                    if_else(str_detect(brands, brand3_prod_mod), "brand3_R",
                                    if_else(str_detect(brands, brand4_prod_mod), "brand4_R", brands)))))

If anyone has any idea what the issue is here, I would greatly appreciate any guidance!


Solution

  • You're close, but you would probably want to use %in% instead of string matching and use case_when instead of nested if_elses.

    I.e.

    library(dplyr)
    
    companies_brands |>
      mutate(brand_r = case_when(brands %in% c("b1prodmod1", "b1prodmod2", "b1prodmod3") ~ "brand1_R",
                                 brands %in% c("b2prodmod1", "b2prodmod2", "b2prodmod3") ~ "brand2_R",
                                 brands %in% c("b3prodmod1", "b3prodmod2", "b3prodmod3") ~ "brand3_R",
                                 brands %in% c("b4prodmod1", "b4prodmod2", "b4prodmod3") ~ "brand4_R",
                                 T ~ brands))
    

    Alternatively you could something like this with a str_replace (however, you might need to do adapt the regex depending on the names of the products):

    library(dplyr)
    library(stringr)
    
    companies_brands |>
      mutate(brand_r = str_replace(brands, "b(\\d).*", "brand\\1_R"))
    

    Output (for both methods are the same):

      companies     brands  brand_r
    1     comp1     brand1   brand1
    2     comp2     brand2   brand2
    3     comp3     brand3   brand3
    4     comp4     brand4   brand4
    5     comp1 b1prodmod1 brand1_R
    6     comp2 b2prodmod2 brand2_R
    7     comp3 b3prodmod3 brand3_R
    8     comp4 b4prodmod3 brand4_R
    

    New data (you would want to include some data of the actual problem, so we can properly test it out. Use e.g. dput):

    companies <- c("comp1","comp2","comp3", "comp4", "comp1", "comp2", "comp3", "comp4")
    brands <- c("brand1", "brand2", "brand3", "brand4", "b1prodmod1", "b2prodmod2", "b3prodmod3", "b4prodmod3")
    
    companies_brands <- cbind(companies, brands)
    companies_brands <- data.frame(companies_brands)