I'm trying to build a method to import multiple types of csvs or Excels and standardize it. Everything was running smoothly until a certain csv showed up, that brought me this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcd in position 133: invalid continuation byte
I'm building a set of try/excepts to include variations of data types but for this one I couldn't figure out how to prevent.
if csv_or_excel_path[-3:]=='csv':
try: table=pd.read_csv(csv_or_excel_path)
except:
try: table=pd.read_csv(csv_or_excel_path,sep=';')
except:
try:table=pd.read_csv(csv_or_excel_path,sep='\t')
except:
try: table=pd.read_csv(csv_or_excel_path,encoding='utf-8')
except:
try: table=pd.read_csv(csv_or_excel_path,encoding='utf-8',sep=';')
except: table=pd.read_csv(csv_or_excel_path,encoding='utf-8',sep='\t')
By the way, the separator of the file is ";".
So:
a) I understand it would be easier to track down the problem if I could identify what's the character in "position 133", however I'm not sure how to find that out. Any suggestions?
b) Does anyone have a suggestion on what to include in that try/except sequence to skip this prob?
For the record, this is probably better than multiple try/except
s
def read_csv(filepath):
if os.path.splitext(filepath)[1] != '.csv':
return # or whatever
seps = [',', ';', '\t'] # ',' is default
encodings = [None, 'utf-8', 'ISO-8859-1'] # None is default
for sep in seps:
for encoding in encodings:
try:
return pd.read_csv(filepath, encoding=encoding, sep=sep)
except Exception: # should really be more specific
pass
raise ValueError("{!r} is has no encoding in {} or seperator in {}"
.format(filepath, encodings, seps))