Search code examples
rcpp

Rcpp find unique character vectors


I am learning Rcpp from Hadley Wickham's Advance R: http://adv-r.had.co.nz/Rcpp.html.

There is one exercise to implement R function unique() in Rcpp using an unordered_set (challenge: do it in one line!). The solution finds the unique numbers in a numeric vector. I am trying to find the unique characters in a character vector using the second code chunk, which produces an error. Any idea on how to achieve this simple function manually ? Thanks!

// [[Rcpp::export]]
    std::unordered_set<double> uniqueCC(NumericVector x) {
      return std::unordered_set<double>(x.begin(), x.end());
    }
    
    
    
    // [[Rcpp::export]]
    std::unordered_set<String> uniqueCC(CharacterVector x) {
      return std::unordered_set<String>(x.begin(), x.end());
    }

Solution

  • For object types not in the STL library you need to define your own hash function. String (capital S) is an Rcpp object.

    The easiest way to do this is to use Rcpp's ability to convert to common STL objects.

    // [[Rcpp::export]]
    std::unordered_set<std::string> uniqueCC(CharacterVector x) {
      auto xv = Rcpp::as<std::vector<std::string>>(x);
      return std::unordered_set<std::string>(xv.begin(), xv.end());
    }
    
    > x <- sample(letters, 1000, replace=T)
    > uniqueCC(x)
     [1] "r" "o" "c" "n" "f" "s" "y" "l" "i" "j" "m" "v" "t" "p" "u" "x" "w" "k" "g" "a" "d" "q" "z" "b" "h" "e"
    

    Alternatively, you can take in a STL string vector and Rcpp magic will do the rest:

    // [[Rcpp::export]]
    std::unordered_set<std::string> uniqueCC(const std::vector<std::string> & x) {
      return std::unordered_set<std::string>(x.begin(), x.end());
    }