My goal is to develop a DQN-agent that will choose its action based on a certain strategy/policy. I previously worked with OpenAi gym-environments, but now I wanted to create my own RL environment.
At this stage, the agent shall either choose a random action or choose his action based on the predictions given by a deep neural network (defined in the class DQN).
So far, I have setup both the neural net model and my environment. The NN shall receive states as its input. These states represent 11 possible scalar values ranging from 9.5 to 10.5 (9.5, 9.6, ..., 10.4, 10.5). Since we're dealing with RL, the agent generates its data during the training process. The output should be either 0 and 1 corresponding to the recommended action.
Now, I would like to feed my agent a scalar value: e.g. a sample state of x = 10 and let him decide upon the action to take (Agent.select_action() is called), I encounter an issue related to the input shape/input dimension.
Here's the code: 1. DQN Class:
class DQN():
def __init__(self, state_size, action_size, lr):
self.state_size = state_size
self.action_size = action_size
self.lr = lr
self.model = Sequential()
self.model.add(Dense(128, input_dim=self.state_size, activation='relu'))
self.model.add(Dense(128, activation='relu'))
self.model.add(Dense(self.action_size, activation='linear'))
self.model.compile(optimizer=Adam(lr=self.lr), loss='mse')
self.model.summary()
def model_info(self):
model_description = '\n\n---Model_INFO Summary: The model was passed {} state sizes,\
\n {} action sizes and a learning rate of {} -----'\
.format(self.state_size, self.action_size, self.lr)
return model_description
def predict(self, state):
return self.model.predict(state)
def train(self, state, q_values):
self.state = state
self.q_values = q_values
return self.model.fit(state, q_values, verbose=0)
def load_weights(self, path):
self.model.load_weights(path)
def save_weights(self, path):
self.model.save_weights(path)
2. Agent Class:
NUM_EPISODES = 100
MAX_STEPS_PER_EPISODE = 100
EPSILON = 0.5
EPSILON_DECAY_RATE = 0.001
EPSILON_MIN = 0.01
EPSILON_MAX = 1
DISCOUNT_FACTOR = 0.99
REPLAY_MEMORY_SIZE = 50000
BATCH_SIZE = 50
TRAIN_START = 100
ACTION_SPACE = [0, 1]
STATE_SIZE = 11
LEARNING_RATE = 0.01
class Agent():
def __init__(self, num_episodes, max_steps_per_episode, epsilon, epsilon_decay_rate, \
epsilon_min, epsilon_max, discount_factor, replay_memory_size, batch_size, train_start):
self.num_episodes = NUM_EPISODES
self.max_steps_per_episode = MAX_STEPS_PER_EPISODE
self.epsilon = EPSILON
self.epsilon_decay_rate = EPSILON_DECAY_RATE
self.epsilon_min = EPSILON_MIN
self.epsilon_max = EPSILON_MAX
self.discount_factor = DISCOUNT_FACTOR
self.replay_memory_size = REPLAY_MEMORY_SIZE
self.replay_memory = deque(maxlen=self.replay_memory_size)
self.batch_size = BATCH_SIZE
self.train_start = TRAIN_START
self.action_space = ACTION_SPACE
self.action_size = len(self.action_space)
self.state_size = STATE_SIZE
self.learning_rate = LEARNING_RATE
self.model = DQN(self.state_size, self.action_size, self.learning_rate)
def select_action(self, state):
random_value = np.random.rand()
if random_value < self.epsilon:
print('random_value = ', random_value)
chosen_action = random.choice(self.action_space) # = EXPLORATION Strategy
print('Agent randomly chooses the following EXPLORATION action:', chosen_action)
else:
print('random_value = {} is greater than epsilon'.format(random_value))
state = np.float32(state) # Transforming passed state into numpy array
prediction_by_model = self.model.predict(state)
chosen_action = np.argmax(prediction_by_model[0]) # = EXPLOITATION strategy
print('NN chooses the following EXPLOITATION action:', chosen_action)
return chosen_action
if __name__ == "__main__":
agent_test = Agent(NUM_EPISODES, MAX_STEPS_PER_EPISODE, EPSILON, EPSILON_DECAY_RATE, \
EPSILON_MIN, EPSILON_MAX, DISCOUNT_FACTOR, REPLAY_MEMORY_SIZE, BATCH_SIZE, \
TRAIN_START)
# Test of select_action function:
state = 10
state = np.array(state)
print(state.shape)
print(agent_test.select_action(state))
Here's the traceback error I get when running this code:
**ValueError**: Error when checking input: expected dense_209_input to have 2 dimensions, but got array with shape ()
I am unsure why the error regarding 2 dimensions occurs since I have configured the NN in the DQN class to receive only 1 dimension.
I have already read through similar questions on stackoverflow (Keras Sequential model input shape, Keras model input shape wrong, Keras input explanation: input_shape, units, batch_size, dim, etc). However, I was not yet able to adapt the suggestions to my use case.
Do you have any suggestions or hints? Thank you for your help!
There are several problems here. First, what you call state_size
is actually a state space, i.e. a collection of all possible states your agent can be in. The state size is actually 1, since there is only one parameter you want to pass as a state.
When you define your input layer here:
self.model.add(Dense(128, input_dim=self.state_size, activation='relu'))
You say that your input dimension will be equal to 11, but then when you call the prediction, you pass it 1 number (10).
So you either need to modify input_dim
to receive only one number, or you can define your state vector like state = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
, each number corresponding to a possible state (from 9.5 to 10.5). So when the state is 9.5 your state vector is [1, 0, 0, ...0]
and so on.
The second problem is that when you define your state you should put square brackets
state = np.array([10])
otherwise the array's shape is (), as I am sure you've found out.
Hope it helps! Let me know if you need any clarification.