Search code examples
c++reinforcement-learningq-learning

Learning Curve in Q-learning


My question is I wrote the Q-learning algorithm in c++ with epsilon greedy policy now I have to plot the learning curve for the Q-values. What exactly I should have to plot because I have an 11x5 Q matrix, so should I take one Q value and plot its learning or should I have to take the whole matrix for a learning curve, could you guide me with it. Thank you


Solution

  • Learning curves in RL are typically plots of returns over time, not Q-losses or anything like this. So you should run your environment, compute the total reward (aka return) and plot it at a corresponding time.