Search code examples
pythonarraysnumpymeshopenai-gym

More efficient way for multidimensional state-action space tiling than with np.meshgrid?


First of all, this is for practice and comparison, I know there are more efficient ways to tile state space than with an linear grid.

To run some reinforcement learning algorithm, I would like to tile my state and action space lineary. As result I want to have every space-action-pair in array form. The problem is, there are different (gym) environments with different state- and action-space dimensions. Therefore I dontlike to have hard coded variables or dimensions. So I need to calculate every state-action pair given only the min and max for each.

I've mostly solved the easy problems, but none of the solutions are "pretty".

First lets compute state and action space. Tile the area with linspace from min to max. I've given the variables for one random test environment.

import numpy as np
NOF_ACTION_SPACE_TILES = 20
NOF_STATE_SPACE_TILES = 10
action_low = np.array([-2])
state_low = np.array([-1, -1, -8])

action_space = np.vstack([*[x.flatten() for x in (np.meshgrid(*(np.linspace(action_low, action_high, NOF_ACTION_SPACE_TILES).T)))]]).T

state_space = np.vstack([*[x.flatten() for x in (np.meshgrid(*(np.linspace(state_low, state_high, NOF_STATE_SPACE_TILES).T)))]]).T

That works as intended and gives all the possible combinations for the states and actions on their own. Any way to do this more straight forward? I needed to use the *[] two times, due to np.meshgrid returning multiple matrices and trying to flatten the vectors.

Now to the funny part...

In the end I want to have every possible state-action pair. Every state with every action. This is coded pretty fast with for loops, but well... numpy and for loops are no speedy friends. So heres my workaround, that works for 1D action space:

s_s, a_s = np.meshgrid(state_space, action_space)

state_action_space = np.concatenate((
   s_s.reshape(-1, state_space.shape[1]),
   a_s.reshape(state_space.shape[1], action_space.shape[1], -1)[0].T), axis=1)

With state_space.shape[1] beeing the dim of a single state / action.

One problem beeing, that np.meshgrid returns a_s for each of the 3 state-space dimensions, and reshaping it like above does not work, because we need to reshape the states to 3xn and the action to 1xn.

This is even worse than the code above, but works for now. Does anyone have suggestions how to use meshgrid or sth else properly and fast?

In the end, for the second step, its just a combination of every row of the two matrices. There has to be a better way...


Solution

  • Thanks to the both answers above, here my final results. I still had to use *() to disassemble the linspace for meshgrid, but it looks more human readable now. The big issue with the state-action code before was that I tried to overcomplicate it. Its just copying the arrays on top of each other. So just copy (or tile in this case) the state-space array as often as you have different actions in your action-space.This is the same as ACTION_SPACE_SIZE^(action-dims).

        action_space = np.stack(np.meshgrid(*(np.linspace(env.action_space.low, env.action_space.high, ACTION_SPACE_SIZE)).T), -1).reshape(-1, env.action_space.shape[0])
    
        state_space = np.stack(np.meshgrid(*(np.linspace(env.observation_space.low, env.observation_space.high, STATE_SPACE_SIZE)).T), -1).reshape(-1, env.observation_space.shape[0])
    
        state_action_space = np.concatenate((
            np.tile(state_space, (action_space.shape[0])).reshape(-1, state_space.shape[1])
            np.tile(action_space, (state_space.shape[0], 1))
            ), axis=1)