I am making an AI for a zero-sum 4-player board game. It's actually not zero-sum (the 4 players will "die" when they lose all their lives, so there will be a player who died first, second, third and a player who survived. However, I am telling the AI that only surviving counts as a win and anything else is a lose) After some research, I figured I would use a minimax algorithm in combination with a heuristic function. I came across this question and decided to do the same as the OP of that question - write an evolutionary algorithm that gives me the best weights.
However, my heuristic function is different from the one the OP of that question had. Mine takes 9 weights and is a lot slower, so I can't let the agents play 1000 games (takes too much time) or breed them with the crossover method (how do I do a crossover with 9 weights?).
So I decided to come up with my own method of determining fitness and breeding. And this question is only about the fitness function.
Here are my attempts at this.
For each agent A
in a randomly generated population of 50 agents, select 3 more agents from the population (with replacement but not the same agent as A
itself) and let the 4 agents play a game where A
is the first player. Select another 3 and play a game where A
is the second player, and so on. For each of these 4 games, if A
died first, its fitness does not change. If A
died second, its fitness is increased by 1. If it died third, its fitness is increased by 2. If it survived, its fitness is increased by 3. Therefore, I concluded that the highest fitness one can get is 12 (surviving/wining all 4 games -> 3 + 3 + 3 + 3).
I ran this for 7 generations and starting from the first generation, the highest fitness is as high as 10. And I calculated the average fitness of the top 10 agents, but the average didn't increase a bit throughout the 7 generations. It even decreased a little.
I think the reason why this didn't work is because there's gotta be a few agents that got lucky and got some poor performing agents as its opponents.
The game setups are the same as my first attempt but instead of measuring the results of each game, I decided to measure how many moves did that agent make before it died.
After 7 generations the average fitness of top 10 does increase but still not increasing as much as I think it should.
I think the reason why this failed is that the game is finite, so there is a finite number of moves you can make before you die and the top performing agents pretty much reached that limit. There is no room for growth. Another reason is that the fitness of the player who survived and the fitness of the player who died third differs little.
From my understanding of EAs (correct me if I'm wrong), the average fitness should increase and the top performing individual's fitness should not decrease over time.
My two attempts failed at both of these. Since the opponents are randomly selected, the top performing agent in generation 1 might get stronger opponents in the next generation, and thus its fitness decreases.
In my attempts, the agents play 200 games each generation and each generation takes up to 3 hours, so I don't want to let them play too many games.
How can I write a fitness function like this?
Seven generations doesn't seem like nearly enough to get a useful result. Especially for a game, I would expect something like 200+ generations to be more realistic. You could do a number of things:
Implement elitism in order to ensure the survival of the best individual(s).
The strength of evolution stems from repeated mutation and crossover, so I'd recommend letting the agents play only a few games per generation (say, 5 ~ 10), at least at the beginning, and then evolve the population. You might even want to do only one game per generation.
In this regard, you could adopt a continuous evolution strategy. What this means is that as soon as an agent dies, they are subjected to mutation, and as soon as an agent wins, they can produce offspring. Or any combination of the two. The point is that the tournament is ongoing, everyone can play against anyone else. This is a little more "organic" in the sense that it does not have strictly defined generations, but it should speed up the process (especially if you can parallelise the evaluation).
I hope that helps. The accepted answer in the post you referenced has a good suggestion about the way you could implement crossover.