I'm working on an PPO agent that plays (well, should) Doom using TF-Agents. As input to the agent, I am trying to give it a stack of 4 images. My complete code is in the following link: https://colab.research.google.com/drive/1chrlrLVR_rwAeIZhL01LYkpXsusyFyq_?usp=sharing
Unhappily, my code does not compile. It returns a TypeError in the line shown below (it is being run in Google Colaboratory).
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-10-d1571cbbda6b> in <module>()
8 t_step = tf_env.reset()
9 while (episode_steps <= max_steps_per_episode or (not t_step.is_last())):
---> 10 policy_step = agent.policy.action(t_step)
11 t_step = tf_env.step(policy_step.action)
12 episode_steps += 1
5 frames
/usr/local/lib/python3.7/dist-packages/tf_agents/utils/nest_utils.py in assert_same_structure(nest1, nest2, check_types, expand_composites, message)
112 str2 = tf.nest.map_structure(
113 lambda _: _DOT, nest2, expand_composites=expand_composites)
--> 114 raise exception('{}:\n {}\nvs.\n {}'.format(message, str1, str2))
115
116
TypeError: policy_state and policy_state_spec structures do not match:
()
vs.
{'actor_network_state': ListWrapper([., .])}
The thing about this error is, for what I've read in the TF-Agents documentation, the user is not supposed to do anything regarding the policy_state since it is generated automatically based on the agent's networks.
This is a similar error I found, but didn't seem to solve my problem, though it hinted me in one of the tryed solutions: py_environment 'time_step' doesn't match 'time_step_spec'
After reading the question and the answer above, I realized I was promising an observation_spec like this:
self._observation_spec = array_spec.BoundedArraySpec(shape=(4, 160, 260, 3), dtype=np.float32, minimum=0, maximum=1, name='screen_observation')
But what I was passing was a list of 4 np.arrays with shape = (160, 260, 3):
self._stacked_frames = []
for _ in range(4):
new_frame = np.zeros((160, 260, 3), dtype=np.float32)
self._stacked_frames.append(new_frame)
I did this because I thought the "shape" of my data wouldn't change, since the list always has the same number of elements as the first dimension of the observation_spec. Lists were easier to delete past frames and add new ones, like this:
def stack_frames(self):
#This gets the current frame of the game
new_frame = self.preprocess_frame()
if self._game.is_new_episode():
for frame in range(4):
self._stacked_frames.append(new_frame)
#This pop was meant to clear an empty frames that was already in the list
self._stacked_frames.pop(0)
else:
self._stacked_frames.append(new_frame)
self._stacked_frames.pop(0)
return self._stacked_frames
I was trying with only np.arrays before, but was not able to delete past frames and add new ones. Probably I was not doing it right, but I felt like the self._stacked_frames was born with the same shape as the observation spec and could not simply delete or add new arrays.
self._stacked_frames = np.zeros((4, 160, 260, 3), dtype=np.float32)
def stack_frames(self):
new_frame = self.preprocess_frame()
if self._game.is_new_episode():
for frame in range(4):
#This delete was meant to clear an empty frames that was already in the list
self._stacked_frames = np.delete(self._stacked_frames, 0, 0)
#I tried "np.concatenate((self._stacked_frames, new_frame))" as well
self._stacked_frames = np.vstack((self._stacked_frames, new_frame))
else:
self._stacked_frames = np.delete(self._stacked_frames, 0, 0)
#I tried "np.concatenate((self._stacked_frames, new_frame))" as well
self._stacked_frames = np.vstack((self._stacked_frames, new_frame))
return self._stacked_frames
This approach up here did not work. Like I said, probably I was doing it wrong. I see three ways of solving this stalemate:
self._stacked_frames Slot: 1 | 2 | 3 | 4
Game image inside self._stacked_frames: A | B | C | D
New game image: E
New game image's positions (step 1): B | B | C | D
New game image's positions (step 2): B | C | C | D
New game image's positions (step 3): B | C | D | D
New game image's positions (step 4): B | C | D | E
New self._stacked_frames: B | C | D | E
This last one seemed like the most certain way to work around my problem, considering I'm right about what it is. I tried it, but the TypeError persisted. I tried it like this:
self._stacked_frames = np.zeros((self._frame_stack_size, 160, 260, 3), dtype=np.float32)
and then:
def stack_frames(self):
new_frame = self.preprocess_frame()
if self._game.is_new_episode():
for frame in range(self._frame_stack_size):
self._stacked_frames[frame] = new_frame
else:
for frame in range((self._frame_stack_size) - 1):
self._stacked_frames[frame] = self._stacked_frames[frame + 1]
self._stacked_frames[self._frame_stack_size - 1] = new_frame
return self._stacked_frames
Two questions then:
I had the same issue and it was when calling policy.action(time_step)
. Action takes an optional parameter policy_state, which is by default "()".
I fixed the issue by calling
policy.action(time_step, policy.get_initial_state(batch_size=BATCH_SIZE))
I'm just starting with TF-Agents, so, I hope this has not some undesired effects.