I'm trying to open an excel file in my Luigi workflow using pandas.read_excel()
using the built in (atomic) luigi methods.
if self.input()
is my luigi target of my excel document, I want to do something like:
with self.input().open('r') as f:
pandas.read_excel(f)
or more generally:
with open(filename) as f:
pandas.read_excel(f)
However, this gives me an error:
*** UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 10: invalid continuation byte
Disclaimer:
The excel file is from an external task, so I do not have control over what type of computer it is made on or whether or not it contains NAs or blank cells.
The issue was that my self.input() (that points to the place where my excel file is saved) should have used format = Nop. My luigi target should return something like:
luigi.LocalTarget('excelfile.xlsx', format=luigi.format.Nop)
With this target definition, I can atomically read using:
with self.input().open() as f:
df = pd.read_excel(f)