I have a data frame in R like this :
ID Type
---------------------------
1 Green-Red-Red-Green
2 Pink-Blue-Red-Red
3 Green-Green-Red
4 Pink-Blue-Blue-Green
5 Red-Red-Red-Green
So, I want to count the number of row containing the words Green and Red but not Pink and Blue.
In this case, the number would be 3 (3 rows, indeed when ID = 1,3 and 5).
I don't find how I can do it with multiple criteria and with characters. How can I do that, please?
you can do
`library(data.table)`
`dt <- as.data.table(data_frame) # transform your data frame to a data table
nrow(dt[(Type%like%"Green") & (Type%like%"Red" & !Type%like%"Pink") &
(Type%like%"Blue"),]) # & stands for AND, ! stands for NOT`
UPDATE according to question in comment
This will give you the number of characters between "Pink" and "Blue"
string <- "Pink-Green-Blue-Red"
tmp <- str_match(string, "Pink(.*?)Blue")
nchar(tmp[,2])
.
So you can do
dt[,tmp:=str_match(Type, "Pink(.*?)Blue")]
nrow(dt[!is.na(tmp)])