I'm trying to train a classifier to classify text from a chat between 2 users so later on I can predict who of the two users is more likely to say X sentence/word. To get there I mined the text from the chat log and ended up with two arrays of words, UserA_words
and UserB_words
.
In which format do I have to transform this arrays to pass it to a classifier like naiveBayes or SVM? How do I pass e.g. a bag of words representation to a classifier?
You're asking what ML representation you should use for user-classification of chat text.
bag-of-words and word-vector are the main representations generally used in text-processing. However user-classification of chat is not the usual text-processing task, we look for telltale features indicative of a specific user. Here are some: