Search code examples
machine-learningdeep-learningreinforcement-learning

Can not understand this line of a popular deep Q learning program


https://github.com/yenchenlin/DeepLearningFlappyBird/blob/master/deep_q_network.py#L82

I have spend a lot of time to understand it.

Why use tf.multiply?

I can not find the math that support this multiply operation.


Solution

  • Every action has a Q_value.

    And the action input a is one-hot.

    So this line is to choose the 'hot' Q_value.