Search code examples
neural-networkartificial-intelligencereinforcement-learning

How should one set up the immediate reward in a RL program?


I want my RL agent to reach the goal as quickly as possible and at the same time to minimize the number of times it uses a specific resource T (which sometimes though is necessary).

I thought of setting up the immediate rewards as -1 per step, an additional -1 if the agent uses T and 0 if it reaches the goal.

But the additional -1 is completely arbitrary, how do I decide how much punishment should the agent get for using T?


Solution

  • You should use a reward function which mimics your own values. If the resource is expensive (valuable to you), then the punishment for consuming it should be harsh. The same thing goes for time (which is also a resource if you think about it).

    If the ratio between the two punishments (the one for time consumption and the one for resource consumption) is in accordance to how you value these resources, then the agent will act precisely in your interest. If you get it wrong (because maybe you don't know the precise cost of the resource nor the precise cost of slow learning), then it will strive for a pseudo optimal solution rather than an optimal one, which in a lot of cases is okay.