Is it possible to convert a xlsx excel file in parquet without converting in csv ? The thing is that i have many excel files with each many sheets and i don't want to convert each sheet in csv and then in parquet so i wonder if there is a way to convert directly excel to parquet ? Or maybe, is there a way to do it with nifi ? I wanted to do it this way using a python script
def csv_from_excel():
wb = xlrd.open_workbook('your_workbook.xls')
sh = wb.sheet_names()
for i in sh:
sh = wb.sheet_by_name(i)
your_csv_file = open('your_csv_file.csv', 'wb')
wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
for rownum in xrange(sh.nrows):
wr.writerow(sh.row_values(rownum))
your_csv_file.close()
`
From a Nifi perspective, the two interesting questions here are:
This should not be too difficult when leveraging the XLSX processor, but if your situation is a bit more complex, this elaborate HCC article might be helpful.
This part is easy, with the PutParquet processor, Nifi can directly write to Parquet.