Search code examples
machine-learningreinforcement-learning

(How) can I use reinforcement learning for already seen data?


Most tutorials and RL courses focuses on teaching how to apply a model (e.g. Q-Learning) to an environment (gym environments) one can input a state in order to get some output / reward

How it is possible to use RL for historical data, where you cannot get new data? (for example, from a massive auction dataset, how can I derive the best policy using RL)


Solution

  • If your dataset is formed, for example, of time series, you can set each instant of time as your state. Then, you can make your agent to explore the data series for learning a policy over it.

    If your dataset is already labeled with actions, you can train the agent over it for learning the a police underlying those actions.

    The trick is to feed your agent with each successive instant of time, as if it were exploring it on real time.

    Of course, you need to model the different states from the information in each instant of time.