I am trying to loop through a data frame and concatenate word blocks that are separated by a space in Rcpp.
I tried reading some answers on Stack Overflow and I am thoroughly confused on how strings are concatenated in Rcpp. (e.g Concatenate StringVector with Rcpp)
I know in C++ you can just use the + operator to add strings.
This is my Rcpp function below
cppFunction('
Rcpp::StringVector formTextBlocks(DataFrame frame) {
#include <string>
using namespace Rcpp;
NumericVector frame_x = as<NumericVector>(frame["x"]);
LogicalVector space = as<LogicalVector>(frame["space"]);
Rcpp::StringVector text=as<StringVector>(frame["text"]);
if (text.size() == 0) {
return text;
}
int dfSize = text.size();
for(int i = 0; i < dfSize; ++i) {
if ( i !=dfSize ) {
if (space[i]==true) {
text[i]=text[i] + text[i+1] ;
}
}
}
return text;
}
')
The error is on the lines of error: no match for 'operator+'
How can strings be concatenated inside a loop?
Since operator+
is defined for std::string
, it is easiest to just use that by converting the text
column to std::vector<std::string>
instead of Rcpp::StringVector
:
Rcpp::cppFunction('
std::vector<std::string> formTextBlocks(DataFrame frame) {
LogicalVector space = as<LogicalVector>(frame["space"]);
std::vector<std::string> text=as<std::vector<std::string>>(frame["text"]);
if (text.size() == 0) {
return text;
}
int dfSize = text.size();
for(int i = 0; i < dfSize - 1; ++i) {
if (space[i]==true) {
text[i]=text[i] + text[i+1];
}
}
return text;
}
')
set.seed(20191129)
textBlock <- data.frame(space = sample(c(TRUE, FALSE), 100, replace = TRUE),
text = sample(LETTERS, 100, replace = TRUE),
stringsAsFactors = FALSE)
formTextBlocks(textBlock)
#> [1] "B" "N" "G" "BM" "M" "O" "C" "F" "OQ" "Q" "FH" "H" "D" "HK" "KH"
#> [16] "H" "S" "LX" "XO" "OY" "Y" "E" "VD" "D" "TN" "N" "LL" "LQ" "Q" "F"
#> [31] "XX" "X" "S" "R" "P" "L" "M" "GK" "KD" "DD" "D" "H" "M" "M" "K"
#> [46] "N" "GP" "PG" "G" "P" "G" "O" "N" "NY" "Y" "OX" "X" "LX" "XF" "FS"
#> [61] "SE" "E" "PS" "S" "YD" "D" "F" "Z" "H" "ZN" "N" "OM" "M" "XH" "HV"
#> [76] "V" "OX" "X" "J" "BZ" "Z" "FZ" "ZE" "E" "SV" "V" "G" "F" "DZ" "ZF"
#> [91] "F" "PB" "B" "K" "N" "U" "B" "PV" "V" "C"
Created on 2019-11-29 by the reprex package (v0.3.0)
Notes:
#include
and using
. These are not necessary and do not belong inside the function definition.i != dfSize
test, which is never false
anyway.i+1
.