Are there tasks a sigle layer perceptron can do better than a multilayer perceptron? If yes, do you have an example?
Any dataset, where the underlying relation is linear, but number of training datapoints is very low will benefit from having the linear model to begin with. It is a relation of task + amount of data, more than nature of the task itself. Another example could be a bit contrived task of extrapolation, where you train on data in [0, 1] x [0, 1] but for some reason test for values in >1,000,000. If the underlying relation was linear, a linear model should have much lower error in the extreme extrapolation regime, as a nonlinear one can just do whatever it "wants" and bend anywhere outside [0,1] x [0,1].