Search code examples
pythonreinforcement-learningopenai-gym

OpenAI Gym - How to create one-hot observation space?


Aside from openAI's doc, I hadn't been able to find a more detailed documentation.

I need to know the correct way to create:

  1. An action space which has 1..n possible actions. (currently using Discrete action space)

  2. An observation space that has 2^n states - A state for every possible combination of actions that has been taken. I would like a one-hot representation of the action vector - 1 for action was already taken, 0 for action still hadn't been taken

How do I do that with openAI's Gym?

Thanks


Solution

  • None of the gym.Spaces provided by the gym package at the time of writing can be used to mirror a one hot encoding representation.

    Luckily for us, we can define our own space by creating a child class of gym.Spaces.

    I have made such a class, which may be what you need:

    import gym
    import numpy as np
    
    
    class OneHotEncoding(gym.Space):
        """
        {0,...,1,...,0}
    
        Example usage:
        self.observation_space = OneHotEncoding(size=4)
        """
        def __init__(self, size=None):
            assert isinstance(size, int) and size > 0
            self.size = size
            gym.Space.__init__(self, (), np.int64)
    
        def sample(self):
            one_hot_vector = np.zeros(self.size)
            one_hot_vector[np.random.randint(self.size)] = 1
            return one_hot_vector
    
        def contains(self, x):
            if isinstance(x, (list, tuple, np.ndarray)):
                number_of_zeros = list(x).contains(0)
                number_of_ones = list(x).contains(1)
                return (number_of_zeros == (self.size - 1)) and (number_of_ones == 1)
            else:
                return False
    
        def __repr__(self):
            return "OneHotEncoding(%d)" % self.size
    
        def __eq__(self, other):
            return self.size == other.size
    

    You can use it thus:

    -> space = OneHotEncoding(size=3)
    -> space.sample()
    array([0., 1., 0.])
    -> space.sample()
    array([1., 0., 0.])
    -> space.sample()
    array([0., 0., 1.])
    

    Hope I could help