I am executing python code with APache Nifi ExectureStreamCommand
I read a csv which I know the encoding is latin. So I am reading my file (file stream object) with :
pd.read_csv(sys.stdin, encoding='latin')
But pandas keep throwing to me this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 172: invalid continuation byte
Thus it seems that pandas do not look at all at the given encoding parameter, and try utf-8 at any cost !
Any idea ? Thank you for you help
I finaly managed to find a solution.
I guess pandas try to open the file stream, and than consider it as a csv and apply the encoding. By default it open the file stream (sys.stdin) with utf-8. Thus I transformed sys.stdin with the following; which encode the file stream with the good encoding:
sys.stdin= io.TextIOWrapper(sys.stdin.buffer, encoding='latin')