I am using the DQN Agent from Ray/RLLib. To gain more insight into how the training process is going, I would like to access the internal state of the Adam-Optimizer, to eg visualize how the running average of the gradient is changing over time. See the minimal code snippet below for illustration.
agent = DQNAgent(config=agent_config, env=self.env)
episode_results = []
for i in range(int(budget)):
# add epoch results to result list
episode_results.append(agent.train())
# add internal values of the optimizer
episode_results[-1]['g_avg'] = None
episode_results[-1]['g_square_avg'] = None
However, I fail to access the Adam optimizer. Since it is constructed using the 'optimizer' function of the agents policy graph and then stored in the _optimizer member -variable (according to the TFPolicy_Graph constructor) my instinct was to access it via
agent._policy_graph._optimizer
. From the dqn agents policy graph:
@override(TFPolicyGraph)
def optimizer(self):
return tf.train.AdamOptimizer(
learning_rate=self.config["lr"],
epsilon=self.config["adam_epsilon"])
From the TFPolicyGraph constructor:
self._optimizer = self.optimizer()
This just gives me:
AttributeError: type object 'DQNPolicyGraph' has no attribute '_optimizer'
The Docs recomment to use agent.local_evaluator
, however I cannot find Adams state in there.
Probably this is just me missunderstanding Rays architecture. So, can anyone help me with that?
Thank you and have a nice day!
The TF optimizer object is accessible via agent.get_policy()._optimizer
.
The reason you were seeing "no attribute _optimizer" before is because _policy_graph
is the policy class, not the object instance, which is present in local_evaluator.policy_map
or via agent.get_policy()
.