I want to use tidymodels to build a workflow for an NLP problem. I have a basic flow built in the traditional way using the naivebayes
package, which basically feeds a document-term matrix (counts of terms occurring in each document) to the multinomial_naive_bayes
function.
While there is a parsnip
interface for the naivebayes package it only seems to work with the generic naive_bayes
function. According to the naivebayes documentation it seems to be the only format that can't be accessed through the generic function:
Please note that the Multinomial Naive Bayes is not available through the naive_bayes function.
So... my 3 questions are:
multinomial_naive_bayes
function using parsnip
?naive_bayes
function with data in this format (counts of features)?parsnip
also supports h2o
and klaR
but I'm not familiar with those packages.I'm expecting the answers to Qs 1 & 2 are "no", but worth checking. Advice on Q3 would be welcome.
I'm expecting the answers to Qs 1 & 2 are "no", but worth checking.
Correct. We don't have engines for that/those. You could add an issue to add them though.
Advice on Q3 would be welcome.
Check out the textrecipes package. It might get you to where you want to be in terms of processing the text and would seamlessly work with those engines that tidymodels supports at the moment. That package is excellent has many capabilities that would otherwise be a pain to use.