I need to check whether my data contains names that a benchmark study doesn't. The problem that I have is that my names are truncated, so are generally a substring of the names reported by the benchmarking study.
For example, a truncated version of my data:
Benchmark `My Data`
Southern Corp Avista
American Electric Power Atmos
Atmos Energy Southern
Delta Natural Gas Co Inc DUET
Avista Corp Duke Energy
Duke Energy Corp American Electric
Pepco Holdings Nextra Energy
For example, you can see that in "My Data", the name is reported as "Southern", but in the Benchmark it is reported as "Southern Corp", similarly, my data will report "Duke Energy" as "Duke Energy Corp".
I need to find out whether my data set contains any names that the benchmarking set doesn't.
So I'd need a flag or a helper column which tells me that my data set includes "Nextra Energy", but the Benchmarking doesn't.
I want the output like:
Benchmark `My Data` Not in Benchmark
Southern Corp Avista 0
American Electric Power Atmos 0
Atmos Energy Southern 0
Delta Natural Gas Co Inc DUET 1
Avista Corp Duke Energy 0
Duke Energy Corp American Electric 0
Pepco Holdings Nextra Energy 1
Thanks!
dat$flag <- NA
for(i in 1:nrow(dat)){
if(any(grepl(dat$Data[i],dat$Benchmark))){
dat$flag[i] <- 0
} else{
dat$flag[i] <- 1
}
}
dat
Benchmark Data flag
1 Southern Corp Avista 0
2 American Electric Power Atmos 0
3 Atmos Energy Southern 0
4 Delta Natural Gas Co Inc DUET 1
5 Avista Corp Duke Energy 0
6 Duke Energy Corp American Electric 0
7 Pepco Holdings Nextra Energy 1