Search code examples
rsubstringstring-matching

Checking if a substring occurs in a separate column


I need to check whether my data contains names that a benchmark study doesn't. The problem that I have is that my names are truncated, so are generally a substring of the names reported by the benchmarking study.

For example, a truncated version of my data:

      Benchmark             `My Data`
Southern Corp                Avista
American Electric Power      Atmos
Atmos Energy                 Southern
Delta Natural Gas Co Inc     DUET
Avista Corp                  Duke Energy
Duke Energy Corp             American Electric
Pepco Holdings               Nextra Energy

For example, you can see that in "My Data", the name is reported as "Southern", but in the Benchmark it is reported as "Southern Corp", similarly, my data will report "Duke Energy" as "Duke Energy Corp".

I need to find out whether my data set contains any names that the benchmarking set doesn't.

So I'd need a flag or a helper column which tells me that my data set includes "Nextra Energy", but the Benchmarking doesn't.

I want the output like:

    Benchmark               `My Data`            Not in Benchmark
Southern Corp                Avista                      0
American Electric Power      Atmos                       0
Atmos Energy                 Southern                    0
Delta Natural Gas Co Inc     DUET                        1
Avista Corp                  Duke Energy                 0
Duke Energy Corp             American Electric           0
Pepco Holdings               Nextra Energy               1 

Thanks!


Solution

  • dat$flag <- NA
    for(i in 1:nrow(dat)){
      if(any(grepl(dat$Data[i],dat$Benchmark))){
        dat$flag[i] <- 0
      } else{
        dat$flag[i] <- 1
      }
    }
    
    dat
                     Benchmark              Data flag
    1            Southern Corp            Avista    0
    2  American Electric Power             Atmos    0
    3             Atmos Energy          Southern    0
    4 Delta Natural Gas Co Inc              DUET    1
    5              Avista Corp       Duke Energy    0
    6         Duke Energy Corp American Electric    0
    7           Pepco Holdings     Nextra Energy    1