How to get the value in TFIDF transformer?

I'm new to Python and recently learning text processing using Bag of Words and TFIDF.

I was trying to get the word in column 1001 in my TFIDF by using the following codes:

count_vectorizer = CountVectorizer()
bag_of_words = count_vectorizer.fit_transform(df)

TFIDF_transformer = TfidfTransformer(norm = 'l2')
TFIDF_representation = TFIDF_transformer.fit_transform(bag_of_words)

TFIDF_transformer.get_feature_names_out()[1000]

and the output is "x1000", a token (I assume) instead of a word.

How can I get the exact word in column 1001 in my TFIDF? Am I using the wrong function or missing other steps to interpret the token I get?

Solution

The count vectorizer returns a sparse matrix which doesn't have column names, you need to convert this to a dataframe and then add the words as the column names by pulling them out of the CountVectorizer:

from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer

count_vectorizer = CountVectorizer()
bag_of_words = count_vectorizer.fit_transform(df)

### Turn sparse array into dense pandas dataframe and add column names (words/tokens)
bag_of_words = pd.DataFrame(bag_of_words.toarray(), columns=count_vectorizer.get_feature_names_out())

TFIDF_transformer = TfidfTransformer(norm = 'l2')
TFIDF_representation = TFIDF_transformer.fit_transform(bag_of_words)

Alternatively, I'll offer that if you're just after TF-IDF vectorization, it will probably be simpler to use the TF-IDF vectorizer directly, as opposed to using the TfidfTransformer:

from sklearn.feature_extraction.text import TfidfVectorizer

TFIDF = TfidfVectorizer()
TFIDF_representation = TFIDF.fit_transform(df)

TFIDF_transformer.get_feature_names_out()