Search code examples
svmknndocument-classificationtext-classification

How can i classify text documents with using SVM and KNN


Almost all of the examples are based on numbers. In text documents i have words instead of numbers.

So can you show me simple examples of how to use these algorithms for text documents classification.

I don't need code example but just logic

Pseudocode would help greatly


Solution

  • The common approach is to use a bag of words model (http://en.wikipedia.org/wiki/Bag_of_words_model) where the classifier would learn the presence of words in a text, it is simple but works surprisingly well.

    Also, here there is a similar question: Prepare data for text classification using Scikit Learn SVM