Not sure I worded my question all that well but its essentially what I am trying to do.
Data example:
Data <- c("NELIG_Q1_1_C1_A", "NELIG_N1_1_EG1_B", "NELIG_V2_1_NTH_C", "NELIG_Q2_1_C5_Q",
"NELIG_N1_1_C1_RA", "NELIG_Q1_1_EG1_QR", "NELIG_V2_1_NTH_PQ", "NELIG_N2_1_C5_PRQ")
I am wanting to filter using a str_detect
on the last set of letter combinations. There will always be four " _ " before the string/pattern I am looking for is, but after the fourth " _ " there could be many different letter combinations. In the above example I am trying to detect only the letter "Q".
If I do a simple
Data2 <- Data %>% filter(str_detect(column, "Q"))
I would get all rows that have Q anywhere in the string. How can I tell it to focus on the last section only?
If the aim is to detect/match those strings that contain Q
in the 'section' after the last _
, then this works:
grep("_[A-Z]*Q[A-Z]*$", Data, value = T, perl = T)
[1] "NELIG_Q2_1_C5_Q" "NELIG_Q1_1_EG1_QR" "NELIG_V2_1_NTH_PQ" "NELIG_N2_1_C5_PRQ"
or, with str_detect
:
library(stringr)
str_detect(Data, "_[A-Z]*Q[A-Z]*$")
[1] FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE
Data:
Data <- c("NELIG_Q1_1_C1_A", "NELIG_N1_1_EG1_B", "NELIG_V2_1_NTH_C", "NELIG_Q2_1_C5_Q",
"NELIG_N1_1_C1_RA", "NELIG_Q1_1_EG1_QR", "NELIG_V2_1_NTH_PQ", "NELIG_N2_1_C5_PRQ")