I find my idea's performance is poor with a high embedding dimension. I believe this issue is related to the activation function. Can you please help me? I try to replace the tanh() with arctan() but still works poor , is there any good strategy ? enter image description here
There are various types of activation functions that you can employ, and the choice of your activation function depends on the the task you aim.
In many cases, ReLU or Leaky ReLU functions are commonly used. For more detailed information and benchmarks on activation functions, you can refer to the paper: Activation Functions in Deep Learning: A Comprehensive Survey and Benchmark.
To implement these activation functions, you can navigate to the 'Non-linear Activations' section within the PyTorch nn module.