machine-learning neural-network classification svm logistic-regression

Machine learning model suggestion for large imbalance data

I have data set for classification problem. I have in total 50 classes.

 Class1: 10,000 examples 
 Class2: 10 examples
 Class3: 5 examples 
 Class4: 35 examples
 .
 .
 . 
and so on.

I tried to train my classifier using SVM ( both linear and Gaussian kernel). My accurate is very bad on test data 65 and 72% respectively. Now I am thinking to go for a neural network. Do you have any suggestion for any machine learning model and algorithm for large imbalanced data? It would be extremely helpful to me

Solution

You should provide more information about the data set features and the class distribution, this would help others to advice you. In any case, I don't think a neural network fits here as this data set is too small for it.

Assuming 50% or more of the samples are of class 1 then I would first start by looking for a classifier that differentiates between class 1 and non-class 1 samples (binary classification). This classifier should outperform a naive classifier (benchmark) which randomly chooses a classification with a prior corresponding to the training set class distribution. For example, assuming there are 1,000 samples, out of which 700 are of class 1, then the benchmark classifier would classify a new sample as class 1 in a probability of 700/1,000=0.7 (like an unfair coin toss).

Once you found a classifier with good accuracy, the next phase can be classifying the non-class 1 classified samples as one of the other 49 classes, assuming these classes are more balanced then I would start with RF, NB and KNN.