python neural-network reinforcement-learning

Reinforcement Learning coach : Saver fails to restore agent's checkpoint

I'm using rl coach through AWS Sagemaker, and I'm running in an issue that I struggle to understand.

I'm performing RL using AWS Sagemaker for the learning, and AWS Robomaker for the environment, like in DeepRacer which uses rl coach as well. In fact, the code only little differs with the DeepRacer code on the learning side. But the environment is completely different though.

What happens:

The graph manager initialization succeeds
A first checkpoint is generated (and uploaded to S3)
The agent loads the first checkpoint
The agent performs N episodes with the first policy
The graph manager fetches the N episodes
The graph manager performs 1 training step and create a second checkpoint (uploaded to S3)
The agent fails to restore the model with the second checkpoint.

The agent raises an exception with the message : Failed to restore agent's checkpoint: 'main_level/agent/main/online/global_step'

The traceback points to a bug happening in this rl coach module:

File "/someverylongpath/rl_coach/architectures/tensorflow_components/savers.py", line 93, in <dictcomp>
    for ph, v in zip(self._variable_placeholders, self._variables)
KeyError: 'main_level/agent/main/online/global_step'

I use just like Deepracer a patch on rl coach. One notable thing in the patch is :

-        self._variables = tf.global_variables()
+        self._variables = tf.trainable_variables()

But shouldn't it result in 'main_level/agent/main/online/global_step' not beeing in self._variables ? The problem i think is that global_step is in self._variables, and it should not be there.

So, there's a few things I don't understand about this problem, and I'm not used to rl coach so any help would be valuable.

Why does it fails only the second time ? (Does the graph manager change the computational graph ?)
How to avoid global_step to be in self._variables ?

A few more info:

I use rl-coach-slim 1.0.0 and tensorflow 1.11.0
Note that I use just like Deepracer a patch on rl coach.

Solution

I removed the patch (technically I removed the patch command in my dockerfile that was applying it), and now it works, the model is correctly restored from the checkpoint.