python numpy tensorflow google-colaboratory tensorflow-agents

How to fix a TypeError between policy_state and policy_state_spec in TF-Agents?

I'm working on an PPO agent that plays (well, should) Doom using TF-Agents. As input to the agent, I am trying to give it a stack of 4 images. My complete code is in the following link: https://colab.research.google.com/drive/1chrlrLVR_rwAeIZhL01LYkpXsusyFyq_?usp=sharing

Unhappily, my code does not compile. It returns a TypeError in the line shown below (it is being run in Google Colaboratory).

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-d1571cbbda6b> in <module>()
      8   t_step = tf_env.reset()
      9   while (episode_steps <= max_steps_per_episode or (not t_step.is_last())):
---> 10     policy_step = agent.policy.action(t_step)
     11     t_step = tf_env.step(policy_step.action)
     12     episode_steps += 1

5 frames
/usr/local/lib/python3.7/dist-packages/tf_agents/utils/nest_utils.py in assert_same_structure(nest1,     nest2, check_types, expand_composites, message)
    112     str2 = tf.nest.map_structure(
    113         lambda _: _DOT, nest2, expand_composites=expand_composites)
--> 114     raise exception('{}:\n  {}\nvs.\n  {}'.format(message, str1, str2))
    115 
    116 

TypeError: policy_state and policy_state_spec structures do not match:
  ()
vs.
  {'actor_network_state': ListWrapper([., .])}

The thing about this error is, for what I've read in the TF-Agents documentation, the user is not supposed to do anything regarding the policy_state since it is generated automatically based on the agent's networks.

This is a similar error I found, but didn't seem to solve my problem, though it hinted me in one of the tryed solutions: py_environment 'time_step' doesn't match 'time_step_spec'

After reading the question and the answer above, I realized I was promising an observation_spec like this:

self._observation_spec = array_spec.BoundedArraySpec(shape=(4, 160, 260, 3), dtype=np.float32, minimum=0, maximum=1, name='screen_observation')

But what I was passing was a list of 4 np.arrays with shape = (160, 260, 3):

self._stacked_frames = []
for _ in range(4):
  new_frame = np.zeros((160, 260, 3), dtype=np.float32)
  self._stacked_frames.append(new_frame)

I did this because I thought the "shape" of my data wouldn't change, since the list always has the same number of elements as the first dimension of the observation_spec. Lists were easier to delete past frames and add new ones, like this:

def stack_frames(self):
  #This gets the current frame of the game
  new_frame = self.preprocess_frame()

  if self._game.is_new_episode():
    for frame in range(4):
      self._stacked_frames.append(new_frame)
      #This pop was meant to clear an empty frames that was already in the list
      self._stacked_frames.pop(0)
  else:
    self._stacked_frames.append(new_frame)
    self._stacked_frames.pop(0)
  return self._stacked_frames

I was trying with only np.arrays before, but was not able to delete past frames and add new ones. Probably I was not doing it right, but I felt like the self._stacked_frames was born with the same shape as the observation spec and could not simply delete or add new arrays.

self._stacked_frames = np.zeros((4, 160, 260, 3), dtype=np.float32)

def stack_frames(self):
  new_frame = self.preprocess_frame()
  
  if self._game.is_new_episode():
    for frame in range(4):
      #This delete was meant to clear an empty frames that was already in the list
      self._stacked_frames = np.delete(self._stacked_frames, 0, 0)
      #I tried "np.concatenate((self._stacked_frames, new_frame))" as well
      self._stacked_frames = np.vstack((self._stacked_frames, new_frame))
  else:
    self._stacked_frames = np.delete(self._stacked_frames, 0, 0)
    #I tried "np.concatenate((self._stacked_frames, new_frame))" as well
    self._stacked_frames = np.vstack((self._stacked_frames, new_frame))
  return self._stacked_frames

This approach up here did not work. Like I said, probably I was doing it wrong. I see three ways of solving this stalemate:

I declare the observation_spec as a list of four frames, each declared as np.array(160, 260, 3);
I declared the observation_spec like I did, but delete and add frames from the self._stacked_frames the right way (not sure it is possible, since self._stacked_frames will be declared as np.array(4, 160, 260, 3) and I'm not sure it can become np.array(3, 160, 260, 3) or np.array(5, 160, 260, 3), before going back to being np.array(4, 160, 260, 3);
I still declare the observation_spec like I did, but I do not delete or add frames. I make a loop where I copy the second frame (that enters the stack_frames function in the second slot) into the first slot, the third frame into the second slot, the fourth frame into the third slot, and finally, the new frame into the fourth slot. An illustration follows:

             self._stacked_frames Slot: 1 | 2 | 3 | 4
Game image inside self._stacked_frames: A | B | C | D
                        New game image: E
   New game image's positions (step 1): B | B | C | D
   New game image's positions (step 2): B | C | C | D
   New game image's positions (step 3): B | C | D | D
   New game image's positions (step 4): B | C | D | E
              New self._stacked_frames: B | C | D | E

This last one seemed like the most certain way to work around my problem, considering I'm right about what it is. I tried it, but the TypeError persisted. I tried it like this:

self._stacked_frames = np.zeros((self._frame_stack_size, 160, 260, 3), dtype=np.float32)

and then:

def stack_frames(self):
  new_frame = self.preprocess_frame()

  if self._game.is_new_episode():
    for frame in range(self._frame_stack_size):
      self._stacked_frames[frame] = new_frame
  else:
    for frame in range((self._frame_stack_size) - 1):
      self._stacked_frames[frame] = self._stacked_frames[frame + 1]
    self._stacked_frames[self._frame_stack_size - 1] = new_frame
  return self._stacked_frames

Two questions then:

Considering I'm right about the TypeError presented, what of the three ways of fixing it is best? Is there anything wrong then with the way I tryed my solution for the 3rd possibility?
Considering I might not be right about the TypeError, what is this error about then?

Solution

I had the same issue and it was when calling policy.action(time_step). Action takes an optional parameter policy_state, which is by default "()".

I fixed the issue by calling

policy.action(time_step, policy.get_initial_state(batch_size=BATCH_SIZE))

I'm just starting with TF-Agents, so, I hope this has not some undesired effects.