If I use Scikit-learn to configure a CountVectorizer object and pass a matrix M of n sentencens (of varying length) to the fit_transform function, I can for example obtain an n-gram representation F. Like this:
vectorizer = CountVectorizer(min_df = 1,
max_features = 2000,
ngram_range = (2, 2),
analyzer="word)
F = vectorizer.fit_transform(A)
This works well. F will now have the shape (2000, n) because I've set max_features to 2000.
But let's say that I obtain one more sentence, and would like to generate a vector that aligns with the features of F and has the same length (2000).. is this even possible, or do I need to keep the original matrix M, add the new sentence to it, and then re-generate all the features?
If I understand what you are asking, you can transform additional sentences using vectorizer.transform(['New sentence here'])
.