Search code examples
machine-learningscikit-learnsvmrandom-forestdocument-classification

Algorithm for Multi-Class Classification of News Article


I want to classify the news article into the category it belongs to. I have 4 categories of news eg." Technology,Sports,Politics and Health." And i have collected around 50 documents for each category as a Training Set

**Is the Training data enough for classification ??? And Which Algorithm should i use for classification?? SVM, Random Forest,Knn, ??

I am using Scikit-learn http://scikit-learn.org/ [python] library for my task

Thanks


Solution

  • There are many ways to attack this problem form CRFs to Random Forests.

    With your limited training data, I would suggest going with a model with high bias such as the linear SVM. Start with training one vs all models for each class and predicting the class with the highest probably. This will give you a baseline for how hard your problem is with the given training data.