artificial-intelligence reinforcement-learning

How would I clip a continuous action in an actor-critic agent?

Let's say we have a bot that has some money and some shares. The input is a list of prices for the last 30 days. It doesn't use an RNN and the prices are entered all at the same time. The output is a continuous action where a positive number is to buy and a negative number is to sell the amount of the stock. How can I restrict the action space so that it is clipped between how many shares it has(the lower bound) and how much money it has(the upper bound)?

Should I have it clipped or just penalize an illegal action? Which option would create the best results?

Solution

You can penalise illegal actions, but in my experience it hasn't shown to have a good effect on the AI (one more thing to worry about). Just clip the output so that if it tries to use more money that it has available it spends all its money. If it tries to sell more of a stock than it has, it sells all of it's stock. The network will learn what happens when it tries to use more resources than it has quite quickly, so it wont cause any degrade in performance.