Search code examples
pythonmachine-learningscikit-learnnaivebayestfidfvectorizer

How to fit a MultinomialNB with more than 1 vectors?


I'm very much new to both ML and stackoverflow, so I'll apologize in advance if this is a dumb question or if I break any rules.

I have 2 different features string, title and article. I made a Tfidfvector for title corpus in the following way and tried to train a MultinomialNB:

TitleString ## The corpus
titleVector = TfidfVectorizer()
titleVectorArray = titleVector.fit_transform(TitleString).toarray()
model = MultinomialNB()
model.fit(titleVectorArray, label_train)

I tried doing the same for article.

ArticleString ## The corpus
ArticleVector = TfidfVectorizer()
ArticleVectorArray = ArticleVector.fit_transform(ArticleString).toarray()
model_2 = MultinomialNB()
model_2.fit(ArticleVectorArray, label_train)

I was wondering is there any way I can use both titleVectorArray and ArticleVectorArray together to train one single MultinomialNB model?

I know, I can join the two corpus together and then find the feature vector, but I don't I really understand the result such method will produce. For, I want to use both Vectors as 2 - different/separate features for the model. And also, how can I implement this in sklearn?

I'd really appreciate any kind of help.


Solution

  • The vectorizer returns the vector of features. If you concatenate the two of them, you will have the set of features that uses both the information from title and information from article. So just do something like:

    TitleString ## The corpus
    titleVector = TfidfVectorizer()
    titleVectorArray = titleVector.fit_transform(TitleString).toarray()
    ArticleString ## The corpus
    ArticleVector = TfidfVectorizer()
    ArticleVectorArray = ArticleVector.fit_transform(ArticleString).toarray()
    model = MultinomialNB()
    model.fit(np.concatenate(ArticleVectorArray, titleVectorArray), label_train)
    

    That said, it is not guaranteed to improve the quality of your model. Having text and title concatenated together looks natural in this case, so try also just doing what you described above.