I am trying to create a function that returns a random master chess game as a .pgn file.
The approach I have taken was I have downloaded the caissabase chess database which is quite large and contains millions of chess games. My original plan was to simply read a random chess game from this file like so:
def extract_random_game(pgn_file, output_file):
random_game = None
num_games = 0
with open(pgn_file) as pgn:
while True:
game = chess.pgn.read_game(pgn)
if game is None:
break
num_games += 1
if random.randint(1, num_games) == 1:
random_game = game
if num_games == 0:
print("No games found in the PGN file.")
return
with open(output_file, 'w') as new_pgn:
new_pgn.write(str(random_game))
print("Random game extracted and saved to:", output_file)
However, this seems to take a long time to run and then it crashes and is hard to debug.
I've also tried extracting a game by its index, which seems to work well for low index numbers, but if I try to extract something like game #1000000, it takes a while to run as well:
def extract_game_by_index(pgn_file, game_index, output_file):
game_counter = 0
target_game = None
with open(pgn_file) as pgn:
while True:
game = chess.pgn.read_game(pgn)
if game is None:
break
game_counter += 1
if game_counter == game_index:
target_game = game
break
if target_game is None:
print(f"Game not found at index {game_index} in the PGN file.")
return
with open(output_file, 'w') as new_pgn:
new_pgn.write(str(target_game))
print(f"Game at index {game_index} extracted and saved to:", output_file)
Any ideas on ways to modify this code or a different approach that I could take?
Instead of reading the whole game, you can just read the game headers. According to the documentation (look at the chess.pgn.read_headers
), it reduces the processing time for big files.
We can thus create a game picker only using headers. From the documentation, you have to retrieve the file offset before reading the header, in order to be able to read the full game later on. This solution returns the chess game from index (the first game has index 1):
def pick_game(pgn, index):
offset = 0
for _ in range(index):
offset = pgn.tell()
chess.pgn.read_headers(pgn)
pgn.seek(offset)
return chess.pgn.read_game(pgn)
Then using the same idea, you can recover the total number of game with:
def num_games(pgn):
num = 0
while chess.pgn.read_headers(pgn):
num += 1
return num
Finally, you should pick the random game index once and for all at the beginning with
total_num_games = num_games(open(filepath_to_pgn, "r"))
random_game_index = random.randint(1, total_num_games)
game = pick_game(open(filepath_to_pgn, "r"), random_game_index)