sql c tensorflow machine-learning reinforcement-learning

How deepmind reduce the calculation for Q values for Atari games?

We know q-learning need tons of calculations:

The huge amount of states in q-learning calculation

For a gaming AI, it needs much more q-values than OX game, GO game.

How this is to be done to calculate these large amounts of q-values?

Thanks.

Solution

MCTS didn't actually reduce any calculation for q-values.

For a very simple Atari gaming AI, it needs much more than 3^(19x19) q values.

Check the deep q network, that solved your problem.

We could represent our Q-function with a neural network, that takes the state (four game screens) and action as input and outputs the corresponding Q-value. Alternatively we could take only game screens as input and output the Q-value for each possible action. This approach has the advantage, that if we want to perform a Q-value update or pick the action with highest Q-value, we only have to do one forward pass through the network and have all Q-values for all actions immediately available.

https://neuro.cs.ut.ee/demystifying-deep-reinforcement-learning/