Search code examples
machine-learningstatereinforcement-learning

Can state in Proximal Policy Optimization contain history?


For example can the State at timestep t actually be made of the state at t and t-1.

S_t = [s_t, s_t-1]

i.e. Does Proximal Policy Optimization already incorporate the state history, or can it be implicit in the State (or neither).


Solution

  • You could concatenate your observations. This is very common to do it RL. Usually in atari domain the last four frames are joined into a single observation. This makes it possible for the agent to understand change in the environment.

    a basic PPO algorithm does not by default implicitly keep track of state history. You could make this possible though by adding a recurrent layer.