Search code examples
pythonchess

What would be a good way to write a function that returns a random chess game played by two masters as a PGN file?


I am trying to create a function that returns a random master chess game as a .pgn file.

The approach I have taken was I have downloaded the caissabase chess database which is quite large and contains millions of chess games. My original plan was to simply read a random chess game from this file like so:

def extract_random_game(pgn_file, output_file):
    random_game = None
    num_games = 0

    with open(pgn_file) as pgn:
        while True:
            game = chess.pgn.read_game(pgn)
            if game is None:
                break
            num_games += 1
            if random.randint(1, num_games) == 1:
                random_game = game

    if num_games == 0:
        print("No games found in the PGN file.")
        return

    with open(output_file, 'w') as new_pgn:
        new_pgn.write(str(random_game))

    print("Random game extracted and saved to:", output_file)

However, this seems to take a long time to run and then it crashes and is hard to debug.

I've also tried extracting a game by its index, which seems to work well for low index numbers, but if I try to extract something like game #1000000, it takes a while to run as well:

def extract_game_by_index(pgn_file, game_index, output_file):
    game_counter = 0
    target_game = None

    with open(pgn_file) as pgn:
        while True:
            game = chess.pgn.read_game(pgn)
            if game is None:
                break

            game_counter += 1
            if game_counter == game_index:
                target_game = game
                break

    if target_game is None:
        print(f"Game not found at index {game_index} in the PGN file.")
        return

    with open(output_file, 'w') as new_pgn:
        new_pgn.write(str(target_game))

    print(f"Game at index {game_index} extracted and saved to:", output_file)

Any ideas on ways to modify this code or a different approach that I could take?


Solution

  • Instead of reading the whole game, you can just read the game headers. According to the documentation (look at the chess.pgn.read_headers), it reduces the processing time for big files.

    We can thus create a game picker only using headers. From the documentation, you have to retrieve the file offset before reading the header, in order to be able to read the full game later on. This solution returns the chess game from index (the first game has index 1):

    def pick_game(pgn, index):
        offset = 0
        for _ in range(index):
            offset = pgn.tell()
            chess.pgn.read_headers(pgn)
        pgn.seek(offset)
        return chess.pgn.read_game(pgn)
    

    Then using the same idea, you can recover the total number of game with:

    def num_games(pgn):
        num = 0
        while chess.pgn.read_headers(pgn):
            num += 1
        return num
    

    Finally, you should pick the random game index once and for all at the beginning with

    total_num_games = num_games(open(filepath_to_pgn, "r"))
    random_game_index = random.randint(1, total_num_games)
    game = pick_game(open(filepath_to_pgn, "r"), random_game_index)