Search code examples
rnlpnaivebayestidymodelsparsnip

Using parsnip to call multinomial_naive_bayes


I want to use tidymodels to build a workflow for an NLP problem. I have a basic flow built in the traditional way using the naivebayes package, which basically feeds a document-term matrix (counts of terms occurring in each document) to the multinomial_naive_bayes function.

While there is a parsnip interface for the naivebayes package it only seems to work with the generic naive_bayes function. According to the naivebayes documentation it seems to be the only format that can't be accessed through the generic function:

Please note that the Multinomial Naive Bayes is not available through the naive_bayes function.

So... my 3 questions are:

  1. Is there a way to access the multinomial_naive_bayes function using parsnip?
  2. Is there a way to use the generic naive_bayes function with data in this format (counts of features)?
  3. What's the best alternative? I see parsnip also supports h2o and klaR but I'm not familiar with those packages.

I'm expecting the answers to Qs 1 & 2 are "no", but worth checking. Advice on Q3 would be welcome.


Solution

  • I'm expecting the answers to Qs 1 & 2 are "no", but worth checking.

    Correct. We don't have engines for that/those. You could add an issue to add them though.

    Advice on Q3 would be welcome.

    Check out the textrecipes package. It might get you to where you want to be in terms of processing the text and would seamlessly work with those engines that tidymodels supports at the moment. That package is excellent has many capabilities that would otherwise be a pain to use.