So I have the following data, let's say called "my_data":
Storm.Type
TYPHOON
SEVERE STORM
TROPICAL STORM
SNOWSTORM AND HIGH WINDS
What I want is to classify whether or not each element in my_data$Storm.Type is a storm, BUT I don't want to include tropical storms as storms (I'm going to classify them separately), such that I would have
Storm.Type Is.Storm
TYPHOON 0
SEVERE STORM 1
TROPICAL STORM 0
SNOWSTORM AND HIGH WINDS 1
I have written the following code:
my_data$Is.Storm <- my_data[grep("(?<!TROPICAL) (?i)STORM"), "Storm.Type"]
But this only returns the "SEVERE STORM" as a storm (but leaves out SNOWSTORM AND HIGH WINDS). Thank you!
The problem is that you're looking for the string " STORM"
with a preceding space, so "SNOWSTORM"
does not qualify.
As a fix, consider moving the space into your negative lookbehind assertion, like so:
ss <- c("TYPHOON","SEVERE STORM","TROPICAL STORM","SNOWSTORM AND HIGH WINDS",
"THUNDERSTORM")
grep("(?<!TROPICAL )(?i)STORM", ss, perl = TRUE)
# [1] 2 4 5
grepl("(?<!TROPICAL )(?i)STORM", ss, perl = TRUE)
# [1] FALSE TRUE FALSE TRUE TRUE
I didn't know that (?i)
and (?-i)
set whether you ignore case or not in regex. Cool find. Another way to do it is the ignore.case
flag:
grepl("(?<!tropical )storm", ss, perl = TRUE, ignore.case = TRUE)
# [1] FALSE TRUE FALSE TRUE TRUE
Then define your column:
my_data$Is.Storm <- grepl("(?<!tropical )storm", my_data$Storm.Type,
perl = TRUE, ignore.case = TRUE)