Search code examples
pythonmachine-learningtext-classificationmultilabel-classification

I want to implement a machine learning or deep learning model for text classification (100 classes)


I have a dataset that is similar to the one where we have movie plots and their genres. The number of classes is around 100. What algorithm should I choose for this 100 class classification? The classification is multi-label because 1 movie can have multiple genres Please recommend anyone from the following. You are free to suggest any other model if you want to.

1.Naive Bayesian
2.Neural networks
3.SVM
4.Random forest
5.k nearest neighbours

It would be useful if you also give the necessary library in python


Solution

  • An important step in machine learning engineering consists of properly inspecting the data. Herby you get some insight that determines what algorithm to choose. Sometimes, you might try out more than one algorithm and compare the models, in order to be sure, that you tried your best on the data.

    Since you did not disclose your data, I can only give you the following advice: If your data is "easy", meaning that you need only little features and a slight combination of them to solve the task, use Naive Bayes or k-nearest neighbors. If your data is "medium" hard, then use Random Forest or SVM. If solving the task requires a very complicated decision boundary combining many dimensions of the features in a non-linear fashion, choose a Neural Network architecture.

    I suggest you use python and the scikit-learn package for SVM or Random forest or k-NN. For Neural Networks, use keras.

    I am sorry that I can not give you THE recipe you might expect for solving your problem. Your question is posed really broad.