Search code examples
pythonlistpython-2.7pattern-matchinganagram

Python - Checking if all and only the letters in a list match those in a string?


I'm creating an Anagram Solver in Python 2.7.

The solver takes a user inputted anagram, converts each letter to a list item and then checks the list items against lines of a '.txt' file, appending any words that match the anagram's letters to a possible_words list, ready for printing.

It works... almost!


# Anagram_Solver.py

anagram = list(raw_input("Enter an Anagram: ").lower())

possible_words = []

with file('wordsEn.txt', 'r') as f:

    for line in f:

        if all(x in line + '\n' for x in anagram) and len(line) == len(anagram) + 1:

            line = line.strip()
            possible_words.append(line)

print "\n".join(possible_words)

For anagrams with no duplicate letters it works fine, but for words such as 'hello', the output contains words such as 'helio, whole, holes', etc, as the solver doesn't seem to count the letter 'L' as being 2 separate entries?

What am I doing wrong? I feel like there is a simple solution that I'm missing?

Thanks!


Solution

  • Your code does as it's expected. You haven't actually made it check whether a letter appears twice (or 3+ times), it just checks if 'l' in word twice, which will always be True for all words with at least one l.

    One method would be to count the letters of each word. If the letter counts are equal, then it is an anagram. This can be achieved easily with the collections.Counter class:

    from collections import Counter
    anagram = raw_input("Enter an Anagram: ").lower()
    
    with file('wordsEn.txt', 'r') as f:
        for line in f:
            line = line.strip()
            if Counter(anagram) == Counter(line):
                possible_words.append(line)
    
    print "\n".join(possible_words)
    

    Another method would be to use sorted() function, as suggested by Chris in the other answer's comments. This sorts the letters in both the anagram and line into alphabetical order, and then checks to see if they match. This process runs faster than the collections method.

    anagram = raw_input("Enter an Anagram: ").lower()
    
    with file('wordsEn.txt', 'r') as f:
        for line in f:
            line = line.strip()
            if sorted(anagram) == sorted(line):
                possible_words.append(line)
    
    print "\n".join(possible_words)