I am currently training a PPO model for a simulation. The PPO model fails to understand that certain conditions will lead to no reward.
These conditions that lead to no reward are very simple rules. I was trying to use these rules to create an 'expert' that the PPO model could use for imitation learning.
Example of Expert-Based Rules:
If resource A is unavailable, then don't select that resource.
If "X" & "Y" don't match, then don't select those.
Example with Imitations Library
I was looking at the "imitations" python library. The example there shows an expert that is a PPO model with more iterations.
https://github.com/HumanCompatibleAI/imitation/blob/master/examples/1_train_bc.ipynb
Questions:
Is there a way to convert the simple "rule-based" expert into a PPO model which can be used for imitation learning?
Or is there a different approach to using a "rule-based" expert in imitation learning?
Looking at how behavioural cloning is implemented:
from imitation.algorithms import bc
bc_trainer = bc.BC(
observation_space=env.observation_space,
action_space=env.action_space,
demonstrations=transitions,
)
All you have to do is to create demonstrations. You do not even need to write "an agent" per se. Just generate sequences from interacting with your environment using your rule based bot, that's all.