Search code examples
rdataframercpp

Prevent Rcpp DataFrames of generating factors


When creating a DataFrame with Rcpp like this:

cppFunction('
    DataFrame testf(){CharacterVector character = {"B","D"}; 
    return(DataFrame::create(Named("foo")=character));}'
    )

The resulting DataFrame always converts the character vector to a factor.

df <- testf()
print(class(df$foo))
# > [1] "factor"

Is there a way to prevent this within C++? Directly in R this is possible with the stringsAsFactors = FALSE option.


Solution

  • When I first saw this question, I thought it would surely be a duplicate, but after a search I don't think it is! Of course, this has been addressed at Rcpp-devel. I'll show that approach here; you add a named element stringsAsFactors set to false, similarly to R:

    #include <Rcpp.h>
    using namespace Rcpp;
    
    // [[Rcpp::export]]
    DataFrame testf(){
        CharacterVector character = {"B","D"}; 
        return(DataFrame::create(Named("foo")=character,
                                 Named("stringsAsFactors") = false));
    }
    

    Then in action:

    Rcpp::sourceCpp("so.cpp")
    options(stringsAsFactors = TRUE)
    df <- testf()
    print(class(df$foo))
    # [1] "character"
    

    You may notice I explicitly set the default value of stringsAsFactors to TRUE via options(). That's because as of R 4.0.0 (which I'm currently running on my laptop), the default value is no longer TRUE, but FALSE.