Search code examples
pythonpandasscikit-learnartificial-intelligencechess

python - EXTREMELY low accuracy (<1%) on homemade chess engine


I am into chess. And no, I do not plan on using this engine for cheating. Right now, it has less than 1% accuracy. No doubt am I better than it. It generates the correct syntax I want it to generate, like d4e5 (piece on d4 captures e5). However, it is so innacurate that it often suggests illegal moves, like d6c7, when there is no piece on d6 or c7, or in other ways illegal. It has a 0.6% accuracy. Even 1980s chess bots could do better. So, how would I increase the accuracy on my model? My goal is at least 30%.

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC

df = pd.read_csv('train.csv')

train_df = df[:800]
test_df = df[800:]

vectorizer = TfidfVectorizer(ngram_range=(1, 2))
X_train_features = vectorizer.fit_transform(train_df['board'])

clf = LinearSVC(random_state=0, tol=1e-5)
clf.fit(X_train_features, train_df['best_move'])

X_test_features = vectorizer.transform(test_df['board'])
predicted_moves = clf.predict(X_test_features)

correct_moves = predicted_moves == test_df['best_move']
accuracy = sum(correct_moves) / len(predicted_moves)
print("Accuracy:", accuracy)

import pickle
with open('model.pkl', 'wb') as f:
    pickle.dump(clf, f)
with open('vectorizer.pkl', 'wb') as f:
    pickle.dump(vectorizer, f)

with open('model.pkl', 'rb') as f:
    clf = pickle.load(f)
with open('vectorizer.pkl', 'rb') as f:
    vectorizer = pickle.load(f)

new_data = pd.DataFrame({'board': ['8/8/2B1p3/6k1/6P1/1K6/3P4/6bQ w - - 0 1']})

new_data_features = vectorizer.transform(new_data['board'])

predicted_move = clf.predict(new_data_features)[0]

print(predicted_move)

Solution

  • There are a number of choices here that mean that this model will essentially never do what you want.

    Input encoding

    Let's consider two similar boards, with white to move in both.

    First board: https://lichess.org/analysis/4k3/6Q1/8/3p4/8/BK6/8/8_w_-_-_0_1?color=white

    Here, the optimal move is Qe7#.

    Second board: https://lichess.org/analysis/4k3/6Q1/3p4/8/8/BK6/8/8_w_-_-_0_1?color=white

    Here, playing Qe7 will blunder the queen. White can, at best, draw the game after that.

    Now, look at the FEN for each board state:

    4k3/6Q1/8/3p4/8/BK6/8/8 w - - 0 1
    4k3/6Q1/3p4/8/8/BK6/8/8 w - - 0 1
    

    The FEN differs only by the order of the 8 and 3p4 tokens. Let's see what happens to them after being transformed by TfidfVectorizer:

    from sklearn.feature_extraction.text import TfidfVectorizer
    positions = [
        "4k3/6Q1/8/3p4/8/BK6/8/8 w - - 0 1",
        "4k3/6Q1/3p4/8/8/BK6/8/8 w - - 0 1",
    ]
    vectorizer = TfidfVectorizer(ngram_range=(1, 2))
    vectorizer.fit(documents)
    for position in positions:
        print(position, vectorizer.transform([position]).todense())
    

    Output:

    4k3/6Q1/8/3p4/8/BK6/8/8 w - - 0 1 [[0.37796447 0.37796447 0.37796447 0.37796447 0.37796447 0.37796447
      0.37796447]]
    4k3/6Q1/3p4/8/8/BK6/8/8 w - - 0 1 [[0.37796447 0.37796447 0.37796447 0.37796447 0.37796447 0.37796447
      0.37796447]]
    

    These two boards vectorize to the same thing. No matter how good your model is, it can't tell the difference between these two boards. TfidfVectorizer is based on CountVectorizer, which assumes that the order of the tokens does not matter. However, this is chess, and the order of the pieces does matter. (CountVectorizer also seems to be dropping all single character tokens, so the model also doesn't know whose turn it is.)

    Output encoding

    This code is using a LinearSVC to classify moves. What happens if the best move is a move that did not appear during training? In this case, the model is incapable of outputting that move.

    I did an analysis of 10 million moves from random games on Lichess. Of those moves, 10% of them never appeared in the most popular 800 moves. (At least, in algebraic notation.) If you have 800 training examples, you can have at most 800 possible moves that the model can choose.

    Instead, I would suggest outputting an evaluation of which side is winning, and by how much. This changes it from a classification problem with 800 outputs to a regression problem with 1 output. You can evaluate each legal move, and pick the one the model thinks is strongest.

    Game tree search

    Game tree search is really strong. It's hard for me to think of any engine (aside from ones constructed as a joke) which don't do any kind of game tree search. If your goal is to create a strong chess engine, you should probably look into minimax or monte-carlo tree search. On the other hand, if your goal is to have fun writing a chess engine, then it doesn't need to be strong to be interesting.