Search code examples
pythonlistfor-loopindexingdata-extraction

How to loop through a list of words and only keep the ones that have specific letters at specific indexes in Python


I have a list of 200,000 words, a list containing indexes, and a keyword. The index_list is not predefined and can be of any size between 0 to len(keyword).

I wish to iterate through the 200,000 words and only keep the ones that contain the letters in the keyword at the specific index.

Examples:

keyword = "BEANS" 
indexList = [0, 3] 

I want to keep words that contain 'B" at the 0th index and 'N' and the 3rd index.

keyword = "BEANS"
indexList = [0, 1, 2]

I want to keep words that contain 'B" at the 0th index and 'E' and the 1st index, and 'A' at the 2nd index.

keyword = "BEANS"
indexList = []

No specific words, return all 200,000 words

At the moment,

I have this code. sampleSpace refers to the list of 200,000 words.

extractedList = []
for i in range(len(indexList)):
    for word in sampleSpace:      
        if (word[indexList[i]] == keyword[indexList[i]]):
            extractedList.append(word)

However, this code is extracting words that have values at the first index OR values at the second index OR values at the Nth index.

I need words to have ALL of the letters at the specific index.


Solution

  • You can use a simple comprehension with all. Have the comprehension loop over all the words in the big word list, and then use all to check all the indices in indexList:

    >>> from wordle_solver import wordle_corpus as corpus
    >>> keyword = "BEANS"
    >>> indexList = [0, 3]
    >>> [word for word in corpus if all(keyword[i] == word[i] for i in indexList)]
    ['BLAND', 'BRUNT', 'BUNNY', 'BLANK', 'BRINE', 'BLEND', 'BLINK', 'BLUNT', 'BEING', 'BRING', 'BRINY', 'BOUND', 'BLOND', 'BURNT', 'BORNE', 'BRAND', 'BRINK', 'BLIND']