Search code examples
python-3.xpandasapache-nifi

Pandas Encoding


I am executing python code with APache Nifi ExectureStreamCommand

I read a csv which I know the encoding is latin. So I am reading my file (file stream object) with :

pd.read_csv(sys.stdin, encoding='latin')

But pandas keep throwing to me this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 172: invalid continuation byte

Thus it seems that pandas do not look at all at the given encoding parameter, and try utf-8 at any cost !

Any idea ? Thank you for you help


Solution

  • I finaly managed to find a solution.

    I guess pandas try to open the file stream, and than consider it as a csv and apply the encoding. By default it open the file stream (sys.stdin) with utf-8. Thus I transformed sys.stdin with the following; which encode the file stream with the good encoding:

    sys.stdin= io.TextIOWrapper(sys.stdin.buffer, encoding='latin')