Search code examples
pythonpandasnon-ascii-characters

Remove non-ascii characters from CSV using pandas


I'm querying a table in a SQL Server database and exporting out to a CSV using pandas:

import pandas as pd

df = pd.read_sql_query(sql, conn)
df.to_csv(csvFile, index=False)

Is there a way to remove non-ascii characters when exporting the CSV?


Solution

  • You can read in the file and then use a regular expression to strip out non-ASCII characters:

    df.to_csv(csvFile, index=False)
    
    with open(csvFile) as f:
        new_text = re.sub(r'[^\x00-\x7F]+', '', f.read())
    
    with open(csvFile, 'w') as f:
        f.write(new_text)