machine-learning tensorflow deep-learning reinforcement-learning q-learning

Reward function for learning to play Curve Fever game with DQN

I've made a simple version of Curve Fever also known as "Achtung Die Kurve". I want the machine to figure out how to play the game optimally. I copied and slightly modified an existing DQN from some Atari game examples that is made with Google's Tensorflow.

I'm tyring to figure out an appropriate reward function. Currently, I use this reward setup:

0.1 for every frame it does not crash
-500 for every crash

Is this the right approach? Do I need to tweak the values? Or do I need a completely different approach?

Solution

The reward of -500 can destroy your network. You should scale the rewards to the values between 1 and -1. (Also scale the input image between -1 and 1 or 0 and 1).

Just give your network a reward of -1 for crashing and a reward of +1 once an enemy crashes. Without enemies a reward of -1 for crashing should be enough. Having a small constant positive living reward can be beneficial in some situations (like when the network has to decide between two inevitable crashes of which one will happen faster than the other) but it will also make the learning of the Q-function more complicated. You can just try with and without a constant reward and see what works best.

The example with an inevitable crash also shows why you should not use a small negative living reward. In such a case the network would chose the path of the fastest crash, while delaying the crash as much as possible would be the better strategy in that situation.