Search code examples
pythonpython-3.xseleniumencodingutf-8

Encoding Error - charmap' codec can't encode character '\u015f'


It seems that I cannot encode the character '\u015f' (letter s with cedilla). Please could someone help?

from selenium import webdriver
import time

with open('Violators_UNGC1.csv', 'w',encoding='utf-8'.replace(u"\u015f", "ş")) as file:
    file.write("Participants; Sectors; Countries; Expelled \n")

driver=webdriver.Chrome(executable_path='C:\webdrivers\chromedriver.exe')

driver.get('https://www.unglobalcompact.org/participation/report/cop/create-and-submit/expelled?page=1&per_page=250')

driver.maximize_window()
time.sleep(2)

for k in range(150):
    
    Participants = driver.find_elements("xpath",'//td[@class="participant"]/a')  
    
    Sectors = driver.find_elements("xpath",'//td[@class="sector"]')
    
    Countries = driver.find_elements("xpath",'//td[@class="country"]')
    
    Expelled = driver.find_elements("xpath",'//td[@class="year"]') 
    
    time.sleep(1)
    
    with open('Violators_UNGC1.csv', 'a') as file:
        for i in range(len(Participants)):
            file.write(Participants[i].text + ";" + Sectors[i].text + ";" + Countries[i].text + ";" + Expelled[i].text + "\n")
            
driver.close()

and I get an error message as per the below:

UnicodeEncodeError
Traceback (most recent call last) Cell In [15], line 28
     26     with open('Violators_UNGC1.csv', 'a') as file:
     27         for i in range(len(Participants)):
---> 28             file.write(Participants[i].text + ";" + Sectors[i].text + ";" + Countries[i].text + ";" + Expelled[i].text + "\n")
     30 driver.close() File ~\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py:19, in IncrementalEncoder.encode(self, input, final)
     18 def encode(self, input, final=False):
---> 19     return codecs.charmap_encode(input,self.errors,encoding_table)[0]

UnicodeEncodeError: 'charmap' codec can't encode character '\u015f' in position 32: character maps to <undefined>

Thank you all !


Solution

  • As mentioned in comments, the default encoding of open is not fixed and should be declared explicitly. UTF-8 works for all Unicode characters. I also suggest opening the file once instead of re-opening it for each row write, and to use the csv module to write CSV files:

    import csv
    
    with open('Violators_UNGC1.csv', 'w', encoding='utf-8') as file:
        w = csv.writer(file, delimiter=';')
        w.writerow(['Participants','Sectors','Countries','Expelled'])
    
        # Fake data for demonstration
        Participants = 'oneş','twoş','threeş'
        Sectors = 'sec1','sec2','sec3'
        Countries = 'USA','Germany','France'
        Expelled = 'A','B','C'
    
        # zip returns all the first items in each group, then the 2nd, etc.
        for row in zip(Participants, Sectors, Countries, Expelled):
            w.writerow(row)
    

    Output file:

    Participants;Sectors;Countries;Expelled
    oneş;sec1;USA;A
    twoş;sec2;Germany;B
    threeş;sec3;France;C