For example can the State at timestep t actually be made of the state at t and t-1.
S_t = [s_t, s_t-1]
i.e. Does Proximal Policy Optimization already incorporate the state history, or can it be implicit in the State (or neither).
You could concatenate your observations. This is very common to do it RL. Usually in atari domain the last four frames are joined into a single observation. This makes it possible for the agent to understand change in the environment.
a basic PPO algorithm does not by default implicitly keep track of state history. You could make this possible though by adding a recurrent layer.