Access optimizers internal state

I am using the DQN Agent from Ray/RLLib. To gain more insight into how the training process is going, I would like to access the internal state of the Adam-Optimizer, to eg visualize how the running average of the gradient is changing over time. See the minimal code snippet below for illustration.

    agent = DQNAgent(config=agent_config, env=self.env)

    episode_results = []

    for i in range(int(budget)):
        # add epoch results to result list
        episode_results.append(agent.train())
        # add internal values of the optimizer
        episode_results[-1]['g_avg'] = None
        episode_results[-1]['g_square_avg'] = None

However, I fail to access the Adam optimizer. Since it is constructed using the 'optimizer' function of the agents policy graph and then stored in the _optimizer member -variable (according to the TFPolicy_Graph constructor) my instinct was to access it via

agent._policy_graph._optimizer

. From the dqn agents policy graph:

@override(TFPolicyGraph)
def optimizer(self):
       return tf.train.AdamOptimizer(
            learning_rate=self.config["lr"],
            epsilon=self.config["adam_epsilon"])

From the TFPolicyGraph constructor:

self._optimizer = self.optimizer()

This just gives me:

AttributeError: type object 'DQNPolicyGraph' has no attribute '_optimizer'

The Docs recomment to use agent.local_evaluator, however I cannot find Adams state in there.

Probably this is just me missunderstanding Rays architecture. So, can anyone help me with that?

Thank you and have a nice day!

Solution

The TF optimizer object is accessible via agent.get_policy()._optimizer.

The reason you were seeing "no attribute _optimizer" before is because _policy_graph is the policy class, not the object instance, which is present in local_evaluator.policy_map or via agent.get_policy().