(How) can I use reinforcement learning for already seen data?

Most tutorials and RL courses focuses on teaching how to apply a model (e.g. Q-Learning) to an environment (gym environments) one can input a state in order to get some output / reward

How it is possible to use RL for historical data, where you cannot get new data? (for example, from a massive auction dataset, how can I derive the best policy using RL)

Solution

If your dataset is formed, for example, of time series, you can set each instant of time as your state. Then, you can make your agent to explore the data series for learning a policy over it.

If your dataset is already labeled with actions, you can train the agent over it for learning the a police underlying those actions.

The trick is to feed your agent with each successive instant of time, as if it were exploring it on real time.

Of course, you need to model the different states from the information in each instant of time.

how to use ML Models in android application
How to make a multifactor model in pROC?
Sagemaker batch transformer with my own pre-trained model
Can we import a python made ML model (.pkl) in rust?
How to use OpenCV to do OCR and text detect and recognition
Realworld parameter optimization
What do the coefficients on correlated variables mean?
Handling Class Imbalance in Multi-class Classification with Custom Loss Function
Struggling to understand complete predictive model process in R
How to allocate GPUs on AWS Free Tier?
Open Source Neural Network Library
How to make FeatureUnion return Dataframe
What is the role of "Flatten" in Keras?
Machine learning model predicts training labels themselves as result
split an audio file into chunks, skip the chunks less than desired time duration, and predict emotion for the entire audio file
Facing ValueError: Target is multiclass but average='binary'
Random forest is worse than linear regression. Is it normal and what is the reason?
Detectron2 - Extract region features at a threshold for object detection
Detectron2 Checkpoint not found
Incomprehensible shape error with one of the inputs of my non-sequential keras model
How to process requests from multiiple users using ML model and FastAPI?
Alternative to device_map = "auto" in Huggingface Pretrained
np.where: "ValueError: operands could not be broadcast together with shapes (38658637,) (9456,)"
How to compute number of weights of CNN?
How to find the connected instances from a minimum spanning trees model in R
Can a neural network be trained while it changes in size?
Keras-rl2 error Compability with Tensorflow
Separate a ingredients/feature into separate columns that is marked with "0" or "1"
How to conditionally assign values to tensor [masking for loss function]?
Uniformity of color and texture in image