machine-learning state reinforcement-learning

Can state in Proximal Policy Optimization contain history?

For example can the State at timestep t actually be made of the state at t and t-1.

S_t = [s_t, s_t-1]

i.e. Does Proximal Policy Optimization already incorporate the state history, or can it be implicit in the State (or neither).

Solution

You could concatenate your observations. This is very common to do it RL. Usually in atari domain the last four frames are joined into a single observation. This makes it possible for the agent to understand change in the environment.

a basic PPO algorithm does not by default implicitly keep track of state history. You could make this possible though by adding a recurrent layer.

ALS (Alternating Least Square) algorithm in multiple rankings for a user
How does one set the pad token correctly (not to eos) during fine-tuning to avoid model not predicting EOS?
java.lang.AssertionError: "Does not support data type INT32" in Android Studio
How to create image of confusion matrix in Python
Cross-validation with nb method
The “Forward/Backward Passage Size” is too large for the pytorch model (Yolov3)
How many images(minimum) should be there in each classes for training YOLO?
Why do neural networks work so well?
Will larger batch size make computation time less in machine learning?
Why KL divergence is negative in Pytorch?
Creating a voice identification system using machine learning
Forward pass with all samples
fit method in sklearn
Calibrating Probabilities in lightgbm or XGBoost
Implementation of F1-score, IOU and Dice Score
Is it ok to have the training history very similar to the validation history?
How to understand Shapley value for binary classification problem?
Stochastic Gradient Descent for Logistic Regression always returns a cost of Inf and weight vector never gets any closer
Data format for Libsvm SVR training in Matlab
Why shouldn't we use multiple activation functions in the same layer?
How to implement a butterworth filter
weka java api stringtovector exception
Predict training data in sklearn
Text classification with weka
How can i apply feature reduction methods in Weka?
'super' object has no attribute '__sklearn_tags__'
lightgbm.cv: cvbooster.best_iteration always returns -1
Wrong detection from yolov5 model
torchrl: Using SyncDataCollector with a custom pytorch dqn
OCR Preprocessing for Oman License Plates - Issues with Alphabet Recognition