reinforcement-learning openai-gym stable-baselines keras-rl

What is the best way to model an environment to force an agent to select `x out of n` choices?

I have an RL problem where I want the agent to make a selection of x out of an array of size n.

I.e. if I have [0, 1, 2, 3, 4, 5] then n = 6 and if x = 3 a valid action could be

[2, 3, 5].

Right now what I tried is have n scores: Output n continuous numbers, and select the x highest ones. This works quite ok.

And I tried iteratively replacing duplicates out of a Multi Discrete action. Where we have x values that can be anything from 0 to n-1.

Is there some other optimal action space I am missing that would force the agent to make unique choices?

Many thanks for your valuable insights and tips in advance! I am happy to try all!

Solution

Since reinforcement learning mostly about interacting with environment, you can approach like this:

Your agent starts choosing actions. After choosing the first action, you can either update the possible choices it has by removing the last choice (with temporary action list) or you can update the values of the chosen action (giving it either negative reward or punishing it). I think this could solve your problem.