I want to check the cosine similarity of two documents having varying length (say one is a one or two liner while other is of 100-200 lines).
I need a way to normalize tfidf or count vectorizer in scikit-learn for this.
TfidfVectorizer has an attribute norm
(see the docs) that deals with this issue. Try, for example, something like this:
vectorizer = TfidfVectorizer(analyzer='word', stop_words='english', norm='l2')
This will normalise the vectors to account for differences in document lengths.