Does anyone know how to workaround a problem of OpenOffice Calc not handling new lines in cells correct?
I have a python script that is dynamically generating an excel workbook using openpyxl via pandas.
The script works fine but when I view cells in OpenOffice that contain newlines all the values are duplicated multiple times. If I open the same file using the Microsoft Excel Viewer everything is displayed correctly and if I use a character other than a new line (e.g. comma, #, etc) it displays fine in both also.
I have a workaround to go into the excel and replace the random character using a macro but would like to avoid that if possible as the process really needs to be completely automated. also because the file will be processed by an another internal tool, I do need these cells to be processed with a new line and I can't change the character.
I have also tried using chr(10) and/or chr(13) but in the former case it just get's replaced in the output by '\n' anyway as expected.
The code I'm currently using is similar to:
test_list = []
for x in range(1,18):
test_list.append([
"value1",
"\n".join(['element1', 'element2', 'element3']),
"value3"
])
data_df = pd.DataFrame(test_list)
fn = r'/path/to/excel/file.xlsx'
writer = pd.ExcelWriter(fn, engine='xlsxwriter')
data_df.to_excel(writer, sheet_name='Data', index=False, header=0)
workbook = writer.book
worksheet = writer.sheets['Data']
worksheet.set_column('A:ZZ',50,
workbook.add_format({'text_wrap': True}))
writer.save()
What happens with the Element data is that it shows in the OpenOffice Calc cell as something like:
Oddly the last item appears to be correct
The same data viewed as a list or via DataFrame.head() appears fine:
pprint(test_list)
[['value1', 'element1\nelement2\nelement3', 'value3'],
['value1', 'element1\nelement2\nelement3', 'value3'],
['value1', 'element1\nelement2\nelement3', 'value3'],
['value1', 'element1\nelement2\nelement3', 'value3'],
['value1', 'element1\nelement2\nelement3', 'value3'],
['value1', 'element1\nelement2\nelement3', 'value3'],
...
['value1', 'element1\nelement2\nelement3', 'value3']]
data_df.head(18):
0 1 2
0 value1 element1\nelement2\nelement3 value3
1 value1 element1\nelement2\nelement3 value3
2 value1 element1\nelement2\nelement3 value3
...
15 value1 element1\nelement2\nelement3 value3
16 value1 element1\nelement2\nelement3 value3
It's just when it get passed to the openpyxl library and viewed in OpenOffice.
Thanks
The code worked fine for me using OpenOffice 4.1.2 on Windows:
For this screenshot, I double-clicked on the bottom of the second row to expand it. Before that, it just showed element3
with a red triangle. But that seems different from the behavior you described.
EDIT:
Ok, I can now confirm the problem. As you said, it occurs with the mysterious number of 18 items. It looks like a bug in OpenOffice, because there is not much difference in the XML files viewed by unzipping file.xlsx
.
I also tried adding CR and LF directly to the XML files, but this just resulted in:
That leaves us with three solutions:
.ods
format instead of MS Office's preferred format.