Custom tokenizer in tm package R not working

please see MWE below, the custom defined tokenizer is not working, why? tm package version is 0.71

library(tm)

ts <- c("This is a testimonial")
corpDs <- Corpus(VectorSource(ts))

#This is not working
ownTokenizer <- function(x) unlist(strsplit(as.character(x), "i+"))
tdm <- DocumentTermMatrix(corpDs,control=list(tokenize=ownTokenizer))
as.matrix(tdm)

#This is working
ownTokenizer(ts)

Output:

Terms

Docs testimonial this

1 1 1

[1] "Th" "s " "s a test" "mon" "al"

Thank you,

Tobias

Solution

I know this is somewhat stale now, but maybe it still helps others: You have to replace corpDS<-Corpus(...) by corpDS<-VCorpus(...) As tm documentation states in the TermDocumentMatrix description, "SimpleCorpus" corpora are always tokenized with a fixed tokenizer - no costumization - it seems to be the same for "Corpus"...

R Language - Extracting the correct Data Type in a PDF Table
Comparing the values of a certain number previous rows with the current row
rpart package installation in R
An efficient way to assign value based on a min-max range and category
Change output of the `purrr::map` function
osmdata_sf returns failed to perform HTTP request curl::curl_fetch_memory() error in R?
Comparing nls() to nls2() - what am I doing wrong
How to add "variables grid" below ggplot
How can I use predefined code snippets outside of code chunks in Quarto within RStudio/Posit?
Wrap text for collapse rows in KableExtra for a long table in R
Implementation of Breusch-Pagan test for random effects in plm with unbalanced panels
Finding a value of a dataset in different ones
Replicate matrix
Unexpected results after converting raster data from geographic to projected coordinate system using the terra package
How to remove rows by condition in R?
How do I add an alias for magrittr pipe from R in vscode
Package ‘neuralnet’ in R, rectified linear unit (ReLU) activation function?
Sub-subtitle in a graph made with `ggplot2`
How can I execute a statement and ignore warnings with tryCatch?
Enumerate events where n consecutive values are not NA
Serialize/deserialize a column with R and DuckDB
Putting multiple plots on the same page in R?
NA values in a non-editable date column in a datatable in a shiny app change to "Invalid Date" when clicked on
How to enable/disable checkboxInput when certain panel is selected
Writing robust R code: namespaces, masking and using the `::` operator
Replacing with conditional value in dplyr case_when()
How to assign pre-determined RGB values to polygons
python/pandas equivalent to dplyr 1.0.0 summarize(across())
Calculating moving average
Estimating non-monotonic bi-exponential curve fit