Search code examples
rfilterstringruppercasegrepl

several grepl with upper and lower


I have the following data.frame: data frame

I'm using dplyr and stringr, and I want to filter the column Nombre in the following way: retain all rows that contain "regimen" or "promocion" or "REGIMEN" or "PROMOCION", i.e., both in uppercase and lowercase. I tried:

str_view(df$Nombre, regex("regimen|promocion", ignore_case=T))

but in that case, it only retains the first word (regimen) both in upper and lower case. If I remove ignore_case=T, it finds both "regimen" and "promocion" but case sensitive, i.e., only lowercase.

Of course, this is an example, I need to filter lots of words, not just "regimen" and "promocion", that's why I don't filter each word separately.


Solution

  • Since the data seems to be in Spanish, I would use a regexp a bit more sofisticated (able to catch accents too).

    library(tidyverse)
    
    
    df <- data.frame(
      N = c(100, 12345, 666, 888),
      Nombre = c("RÉGIMEN", "promoción", "ley", "otro regimen")
    )
    df %>%
      filter(str_detect(Nombre, regex("r\\wgimen|promoci\\wn", ignore_case = TRUE)))
    #>       N       Nombre
    #> 1   100      RÉGIMEN
    #> 2 12345    promoción
    #> 3   888 otro regimen