We know q-learning need tons of calculations:
The huge amount of states in q-learning calculation
For a gaming AI, it needs much more q-values than OX game, GO game.
How this is to be done to calculate these large amounts of q-values?
Thanks.
MCTS didn't actually reduce any calculation for q-values.
For a very simple Atari gaming AI, it needs much more than 3^(19x19) q values.
Check the deep q network, that solved your problem.
We could represent our Q-function with a neural network, that takes the state (four game screens) and action as input and outputs the corresponding Q-value. Alternatively we could take only game screens as input and output the Q-value for each possible action. This approach has the advantage, that if we want to perform a Q-value update or pick the action with highest Q-value, we only have to do one forward pass through the network and have all Q-values for all actions immediately available.
https://neuro.cs.ut.ee/demystifying-deep-reinforcement-learning/