Search code examples
pandassklearn-pandas

How to apply CountVectorizer to a column of a dataset?


I have been able to use the CountVectorizer in single text strings and all but I have a dataset that has 80.000 length. How can I apply CountVectorizer to everything in a single column? I have tried the following:

count_vect = CountVectorizer(lowercase=False)
cv = count_vect.fit_transform(df['Tokenized_Review'])

Thank you all in advance!


Solution

  • Thank you, everyone, for your time. Turns out this will do the trick:

    df['Vectorized'] = 'default value'
    vectorizer = CountVectorizer()
    for i in range(0,len(df):
        vectorizer.fit_transform(df['Tokenized_Review'][i])
        df['Vectorized'][i] = vectorizer.vocabulary_