Search code examples
pythonpandasencodinggoogle-colaboratory

ANSI Encoding for pandas on Google Colab?


So there's a file named as 'students_data.txt' which holds records in a tab separated form and file itself is encoded with ANSI coding. On my local Windows machine (ANSI is unconditionally supported by Windows:) ) I am able to read the file in a breeze using pandas as follows:

pd.read_csv(input_directory+'students_data.txt', '\t', encoding='ANSI')

Data is read and everything is fine however on google colab it produces this error:

LookupError: unknown encoding: ansi on pandas

Interestingly, pandas version is same for both my machine and colab. So my thinking is I am not able to decode ANSI files because of nature of Colab machines...

So my questions are:

  • How can I use ANSI Encoding for pandas on Google Colab?
  • Why the pandas encoding depends on the platform its used?

Solution

  • Try using ISO-8859-1 encoding

    pd.read_csv(input_directory+'students_data.txt', '\t', encoding='ISO-8859-1')
    

    Turns out this is the solution as ANSI is microsoft propriety and can only be identified by pandas on Microsoft Windows system. Google colab on the other hand runs linux (can be checked via os,system()). ANSI is a superset of ISO-8859-1 so a good chance that it will work for ANSI files. Details: over here