Search code examples
tensorflowpytorchartificial-intelligenceactorreinforcement-learning

Reinforcement learning actor predicting same actions during initial training


I have a reinforcement learning Actor Critic model with lstm. During initial training it is giving same action value for all the states.

Can someone expert in AI/RL please help to let me know if this is normal behavior during training? Also can you please help to let me know what should be the ideal size of lstm and linear layers if I have a state_dimension = 50 and action_dimension = 3.

Thanks in advance


Solution

  • This can be caused by numerous things:

    1 - Check weights initialization

    2 - Check the interface on which the model makes the inference, and if there is no other things preventing it from make the action choice other than the activation of that specific neuron

    3 - Check your reward function. Avoid too large negative rewards. Also, if takes the same action is not an obvious way to avoid negative rewards.