Search code examples
pythonoptimizationwordle-game

Optimizing Wordle Bot with Python - Search for a word that contains a, b, and c?


I have been working on writing a Wordle bot, and wanted to see how it preforms with all 13,000 words. The problem is that I am running this through a for loop and it is very inefficient. After running it for 30 minutes, it only gets to around 5%. I could wait all that time, but it would end up being 10+ hours. There has got to be a more efficient way. I am new to python, so any suggestions would be greatly appreciated.

The code here is the code that is used to limit down the guesses each time. Would there be a way to search for a word that contains "a", "b", and "c"? Instead of running it 3 separate times. Right now the containts, nocontains, and isletter will each run every time I need to search for a new letter. Searching them all together would greatly reduce the time.

#Find the words that only match the criteria
def contains(letter, place):
    list.clear()
    for x in words:
        if x not in removed:
            if letter in x:
                if letter == x[place]:
                    removed.append(x)
                else:
                    list.append(x)
            else:
                removed.append(x)
def nocontains(letter):
    list.clear()
    for x in words:
        if x not in removed:
            if letter not in x:
                list.append(x)
            else:
                removed.append(x)
def isletter(letter, place):
    list.clear()
    for x in words:
        if x not in removed:
            if letter == x[place]:
                list.append(x)
            else:
                removed.append(x)

Solution

  • The performance problems can be massively reduced by using sets. Any time that you want to repeatedly test for membership (even only a few times), e.g. if x not in removed, you want to try to make a set. Lists require checking every element to find x, which is bad if the list has thousands of elements. In a Python set, if x not in removed should take as long to run if removed has 100 elements or 100,000, a small constant amount of time.

    Besides this, you're running into problems by trying to use mutable global variables everywhere, like for list (which needs to be renamed) and removed. There's no benefit to doing that and several downsides, such as making it harder to reason about your code or optimize it. One benefit of Python is that you can pass large containers or objects to functions without any extra time or space cost: calling a function f(huge_list) is as fast and uses as much memory as f(tiny_list), as if you were passing by reference in other languages, so don't hesitate to use containers as function parameters or return types.

    In summary, here's how your code could be refactored if you take away 'list' and 'removed' and instead store this as a set of possible words:

    all_words = []  # Huge word list to read in from text file
    current_possible_words = set(all_words)
    
    def contains_only_elsewhere(possible_words, letter, place):
        """Given letter and place, remove from possible_words
         all words containing letter but not at place"""
        to_remove = {word for word in possible_words
                     if letter not in word or word[place] == letter}
        return possible_words - to_remove
    
    def must_not_contain(possible_words, letter):
        """Given a letter, remove from possible_words all words containing letter"""
        to_remove = {word for word in possible_words
                     if letter in word}
        return possible_words - to_remove
    
    def exact_letter_match(possible_words, letter, place):
        """Given a letter and place, remove from possible_words
         all words not containing letter at place"""
        to_remove = {word for word in possible_words
                     if word[place] != letter}
        return possible_words - to_remove
    

    The outside code will be different: for example,

    current_possible_words = exact_letter_match(current_possible_words, 'a', 2)`
    

    Further optimizations are possible (and much easier now): storing only indices to words rather than the strings; precomputing, for each letter, the set of all words containing that letter, etc.