I am new to machine learning and I built a neural network with 2 dense layers. When I was experimenting, I had the following observations:
When I decreased the number of nodes in each dense layer, I seemed to get better training and prediction accuracy. This was surprising to me because I would assume the more nodes in a dense layer, the more the model can understand the data. Why does decreasing node number improve accuracy?
The model also yielded better results when the number of nodes in each dense layer was not consistent. For example, I got the best result when one dense layer had 5 nodes and the other layer had 10, than both layers having 5 nodes or 10 nodes. Why is that? Is it common that inconsistent node counts in the dense layers improve accuracy?
To answer your questions sequentially:
b) The answer a) does not apply if only the training accuracy improved when decreasing the number of nodes, since overfitting increses training accuracy, but reduces the test/holdout-accuracy.
For common heuristics applied when build a model from scratch, particularly with Dense layers, please consult the next link: https://towardsdatascience.com/17-rules-of-thumb-for-building-a-neural-network-93356f9930af. Some of the heuristics applicable are available for Dense layers as a whole; it does not matter if the input, like in your problem, will come from an LSTM processing.