graph machine-learning artificial-intelligence 2d-games

Improve a puzzle game AI using machine learning

My motivation for asking this question is that I have found an interesting problem using machine learning on a graph data set. There are papers out there on this subject. For example, "Learning from Labeled and Unlabeled Data on a Directed Graph" (Zhou, Huang, Scholkopf). However I do not have a background in artificial intelligence or machine learning so I would like to write a smaller program for a more general audience before working on anything scientific.

Several years ago I wrote a game called Solumns. It is an evil variant of the classic Sega game Columns. Inspired by bastet, it bruteforces for colour combinations disadvantageous to the player. It is hard.

I would like to improve upon its AI. I figure the game space (grid of coloured blocks, column positions, column colours) fits a graph structure better than a list of attributes. If that is the case, then this problem is similar to my research problem.

I am considering using the following high-level plan to solve this problem:

I'm thinking what would be useful is if the AI opponent could assign a fitness rating to a possible move based on more data than the number of existing squares on the board after the move. I'm thinking using a categoriser. Train on the move and all past moves, using the course of the rest of the game as a measure of success.
I am also thinking of developing a player bot that can beat the standard AI opponent. This could be useful when generating data for 1.
Use a sample of the player bot's games to build an AI that beats the strategic player. Maybe use this data for 1, too.
Write a fun AI that delegates to a possible combination of 1, 3, and the original AI, when appropriate, which I will determine using experimentation to find heuristic fudge factors.

To build the player bot, I figured I could use brute force to compute the sample space. Then use machine learning techniques such as those used in building Random Forests to create some kind of decision maker.

Building the AI opponent is where I am most perplexed.

Specific questions then:

Rating moves sounds like the kind of thing people do with chess, and although I'll admit my approach may be ignorant, there is a lot about this in literature and I can learn from that. Question is, should the player bot and AI opponent create the data sample? It sounds like I'm getting confused between different sample sets, which sounds like a recipe for bad training. Should I just play the game a bunch?
What kind of algorithm should I consider for training the player bot against the current AI?
What kind of algorithm should I consider for training an AI opponent against the player bot?

Extra info:

I am deliberately not asking if this strategy is a good fit for programming a game AI. Sure, you might be able to write a great AI more simply (after all it is already difficult to play). I want to learn about machine learning by doing something fun.
The original game was written in a mixture of racket and C. I am porting it to jruby for various reasons, likely with extensions or RPC calls to another faster language. I am not so interested in existing language-specific solutions here. I want to develop skills in this area and am not afraid to implement an algorithm for myself.

You can get the source for the original game here

Solution

I would not go for machine learning here. Look at game playing AIs.

You have an adversarial games (like Go) with two asymmetric players:

The user who places the pieces,
and the computer who chooses the pieces (instead of choosing pieces by chance).

I would probably start with Monte Carlo Tree Search.