Search code examples
pythontensorflowreinforcement-learningtensorflow-agents

Benefit of storing state as a list/integer in tensorflow agents


In the environment tutorial of tensorflow agents (https://www.tensorflow.org/agents/tutorials/2_environments_tutorial), the state is stored as an integer. When the state is required, it is converted to a numpy array:

from tf_agents.environments import py_environment
import numpy as np

class CardGameEnv(py_environment.PyEnvironment):

    def __init__(self):
        self._state = 0

    def _step(self,action):
        state_array = np.array([self._state], dtype=np.int32)
        return np.transition(state_array, reward=1.0, discount=0.9)

Is there any reason why they do this, instead of just storing the state directly as a numpy array? So like this:

from tf_agents.environments import py_environment
import numpy as np
class CardGameEnv(py_environment.PyEnvironment):

    def __init__(self):
        self._state = np.array([0], dtype=np.int32)

    def _step(self,action):
        return np.transition(self._state, reward=1.0, discount=0.9)

Is there any downside to using the second method? Or is this equally valid?


Solution

  • I often do not store data as numpy array for convenience. I sometimes use pandas dataframes, sometimes lists, it depends on how you update your current state.

    Nevertheless, storing the state as numpy array is always more efficient, since you do not need to convert the state to numpy array when returning an observation within a transition.