I'm working in A2C reinforcement learning where my environment has an increasing and decreasing in the number of agents. As a result of the increasing and decreasing the number of agents, the state space will also change. I have tried to solve the problem of changing the state space this way:
If the state space exceeds the maximum state space that selected
as n_input
, the excess state space will be selected by
np.random.choice
where random choice provides a way of creating random samples from the state space after converting the state space into probabilities.
If the state space is less than the maximum state I padded the state space with zeros.
def get_state_new(state):
n_features = n_input-len(get_state(env))
# print("state",len(get_state(env)))
p = np.array(state)
p = np.exp(p)
if p.sum() != 1.0:
p = p * (1. / p.sum())
if len(get_state(env)) > n_input:
statappend = np.random.choice(state, size=n_input, p=p)
# print(statappend)
else:
statappend = np.zeros(n_input)
statappend[:state.shape[0]] = state
return statappend
It works but the results are not as expected and I don't know if this correct or not.
My question
Are there any reference papers that deal with such a problem and how to deal with the changing of state space?
I solve the problem using different solutions but I found that the encoding is the best solution for my problem
[1]
mentioned that the extra connected autonomous
vehicles (CAVs) are not included in the state and if they are less
than the max CAVs, the state is padded with zeros. We can select how
many agents that we can share their state adding to the agent’s
state.For the encoder, I use the Neural machine translation with attention code
class Encoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):
super(Encoder, self).__init__()
self.batch_sz = batch_sz
self.enc_units = enc_units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
self.gru = tf.keras.layers.GRU(self.enc_units,
return_sequences=True,
return_state=True,
recurrent_initializer='glorot_uniform')
def call(self, x, hidden):
x = self.embedding(x)
output, state = self.gru(x, initial_state = hidden)
return output, state
def initialize_hidden_state(self):
return tf.zeros((self.batch_sz, self.enc_units))
1- Vinitsky, E., Kreidieh, A., Le Flem, L., Kheterpal, N., Jang, K., Wu, C., ... & Bayen, A. M. (2018, October). Benchmarks for reinforcement learning in mixed-autonomy traffic. In Conference on Robot Learning (pp. 399-409)
2- Kochkina, E., Liakata, M., & Augenstein, I. (2017). Turing at semeval-2017 task 8: Sequential approach to rumour stance classification with branch-lstm. arXiv preprint arXiv:1704.07221.
3- Ma, L., & Liang, L. (2020). Enhance CNN Robustness Against Noises for Classification of 12-Lead ECG with Variable Length. arXiv preprint arXiv:2008.03609.
4- How to feed LSTM with different input array sizes?
5- Zhao, X., Xia, L., Zhang, L., Ding, Z., Yin, D., & Tang, J. (2018, September). Deep reinforcement learning for page-wise recommendations. In Proceedings of the 12th ACM Conference on Recommender Systems (pp. 95-103).