Search code examples
pythonpandasnlpword-frequency

Efficient way to creating a Term Frequency Matrix from a Pandas Dataframe


Given a pandas data frame with 2 columns - column 1 is the user name, and column 2 is the content linked to the user.

enter image description here

How does one create a Term Frequency Matrix that looks like the following?

enter image description here

My attempt: enter image description here

So it seems like this is working, but I want it to show column and row names in the final matrix form.


Solution

  • What if you convert it to a dataframe again?

    pd.DataFrame(X.toarray(), columns=vectorizer.get_feature_names_out())