Search code examples
rrcpp

vectorised comparison of strings to single value in Rcpp


The == operator in Rcpp works as expected when comparing numeric vectors against a single value. I.e. each element of the vector is compared to the value and a logical vector is returned. For example, consider the following which behaves as expected:

library(Rcpp)
cppFunction('
CharacterVector test_vals(NumericVector x) {
  if (is_true(any(x == 3))) return ("Values include 3");
  return ("3 not found");
}')
test_vals(1:2)
# [1] "3 not found"
test_vals(1:5)
# [1] "Values include 3"

However, if I try to compare a character vector against a character scalar, it only seems to test the first element of the vector:

cppFunction('
CharacterVector test_names(NumericVector x) {
  CharacterVector y = x.attr("names");
  if (is_true(any(y == CharacterVector::create("foo")))) return ("Names include foo");
  return ("foo not found");
}')
test_names(c(a=1, b=2, foo=3))
# [1] "foo not found"
test_names(c(foo=3, a=1, b=2))
# [1] "Names include foo"

I know that comparing two character vectors of the same length appears to work in a vectorised manner, as expected:

cppFunction('
CharacterVector test_names(NumericVector x) {
  CharacterVector y = x.attr("names");
  CharacterVector foo(x.size());
  foo.fill("foo");
  if (is_true(any(y == foo))) return ("Names include foo");
  return ("foo not found");
}')
test_names(c(a=1, b=2, foo=3))
# [1] "Names include foo"
test_names(c(foo=3, a=1, b=2))
# [1] "Names include foo"
test_names(c(a=1, b=2))
# [1] "foo not found"

Does this mean that comparisons of character vectors against a single value has not been implemented in Rcpp, or am I just missing how to do it?


Solution

  • Following up on our quick discussion, here is a very simple solution as the problem (as posed) is simple -- no regular expression, no fancyness. Just loop over all elements and return as soon as match is found, else bail with false.

    Code

    #include <Rcpp.h>
    
    // [[Rcpp::export]]
    bool contains(std::vector<std::string> sv, std::string txt) {
        for (auto s: sv) {
            if (s == txt) return true;
        }
        return false;
    }
    
    /*** R
    sv <- c("a", "b", "c")
    contains(sv, "foo")
    sv[2] <- "foo"
    contains(sv, "foo")
    */
    

    Demo

    > Rcpp::sourceCpp("~/git/stackoverflow/66895973/answer.cpp")
    
    > sv <- c("a", "b", "c")
    
    > contains(sv, "foo")
    [1] FALSE
    
    > sv[2] <- "foo"
    
    > contains(sv, "foo")
    [1] TRUE
    > 
    

    And that is really just shooting from the hip before looking for either what we may already have in the (roughly) 100k lines of Rcpp, or what the STL may have...

    The same will work for your earlier example of named attributes as you can the same, of course, with a CharacterVector, and/or use the conversion from it to std::vector<std::string> we used here, or... If you have an older compiler, switch the for from C++11 style to K+R style.