Are there techniques I could use to classify data (I'm using pytorch) using a large number of classes?
I noticed that if I try to build a multi layer perceptron network I run out of memory on the GPU given the last layer must have too many neurons, even though my GPU has 24Gb of memory. I have about 3000 classes.
Is there a way or technique do handle this type of scenario?
Note that I am NOT asking for an opinion on which technique is better. I am asking for an objective list of techniques that could be used in this scenario. This can be answered in fact-based fashion and include citations, etc if needed.
A tricky approach that you could try to follow is to divide the model into two sub nn.Modules. You send the first one to the GPU and keep the last layer(s), the classifier in the CPU. By doing so you will lose some training speed and overall performance, but you are going to be able to handle this huge layer in the MLP.
However it is not a common thing to have this large number of classes, are you doing some Computer Vision or NLP task? If so, you could use some task-specific network types such as CNNs or LSTMs, that perform better with a much more efficient number of parameters to deal with (e.g. using pooling layers in the CNNs). If you have to use a MLP try reducing the dimensionality of the penultimate layer.