Search code examples
pythoncharacter-encoding

Getting Error codec can't encode characters in position 8-13: character maps to <undefined>


I get this error

Traceback (most recent call last):
  File "C:\Users\Anthony\PycharmProjects\ReadFile\main.py", line 14, in <module>
    masterFile.write("Line {}: {}\n".format(index, line.strip()))
  File "C:\Users\Anthony\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 8-13: character maps to undefined

The program is supposed to search for all txts in a directory and search them for a specific word. Once it finds it print them to a file with the line and then also print another copy of the file with full line numbers. There will be like 100 txt files and it will work on the first 3 before I get this error message. All the files are UTF-8 encoded. I tried changing with open(file, encoding="utf-8") as f: but it didn't work.

import glob

searchWord = "Hello"
dataFile = open("C:/Users/Anthony/Documents/TextDataFolder/TextData.txt", 'w')
masterFile = open("C:/Users/Anthony/Documents/TextDataFolder/masterFile.txt", 'w')

files = glob.iglob("#C:/Users/Anthony/Documents/Texts/*.txt", recursive = True)
for file in files:
    with open(file) as f:  
        print(file)
        for index, line in enumerate(f):
             #print("Line {}: {}".format(index, line.strip()))
             masterFile.write("Line {}: {}\n".format(index, line.strip()))
             if searchWord in line:
                print("Line {}: {}".format(index, line.strip()))
                dataFile.write("Line {}: {}\n".format(index, line.strip()))

Solution

  • I eventually figured it out... I feel like an idiot. The problem wasn't my reading of the files. It was my writing wasn't encoded. had only attempted to encoding my read. So Final Looks like this

    import glob
    
    searchWord = "Hello"
    dataFile = open("C:/Users/Anthony/Documents/TextDataFolder/TextData.txt", 'w', encoding="utf-8")masterFile = masterFile = open("C:/Users/Anthony/Documents/TextDataFolder/masterFile.txt", 'w', encoding="utf-8")
    
    files = glob.iglob("#C:/Users/Anthony/Documents/Texts/*.txt", recursive = True)
    for file in files:
        with open(file, "r", encoding="utf-8") as f:  
            print(file)
            for index, line in enumerate(f):
                 #print("Line {}: {}".format(index, line.strip()))
                 masterFile.write("Line {}: {}\n".format(index, line.strip()))
                 if searchWord in line:
                    print("Line {}: {}".format(index, line.strip()))
                    dataFile.write("Line {}: {}\n".format(index, line.strip()))