Looking through scikit-learn documentation code, it suggests to implement the Countvectorizer first and then on top TFIDF. Can I use only TFIDF? http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html
If I only use TFIDF and I give my preprocessed texts as input it won't take the data type (I tried as a list and a np array). Can someone help?
CountVectorizer
a TfidfTransformer
. Using directly TfidfVectorizer
produces the same result. Thus, it is up to you to chose which weighting scheme you want.tokenizer=
and preprocessor=
. What is your issue here ?