How to get the sum of any given column in the term frequency matrix returned by sklearn CountVectorizer
?
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
corpus = [ 'This is a sentence',
'Another sentence is here',
'Wait for another sentence',
'The sentence is coming',
'The sentence has come'
]
x = vectorizer.fit_transform(corpus)
For example I want to find out the frequency of sentence
in the matrix. So I want the sum of the sentence
column. I couldn't figure out a way to do this:
x['sentence'].sum()
but that didn't helpYou can try the following:
feature_names()
list from CountVectorizer.x
, in your case).Code:
import numpy as np
term_to_sum = 'sentence'
index_term = vectorizer.get_feature_names().index(term_to_sum)
s = np.sum(x[:, index_term]) # here you get the sum