openoffice calc - newline causes duplicate value in cells (pandas/openpyxl)

Does anyone know how to workaround a problem of OpenOffice Calc not handling new lines in cells correct?

I have a python script that is dynamically generating an excel workbook using openpyxl via pandas.

The script works fine but when I view cells in OpenOffice that contain newlines all the values are duplicated multiple times. If I open the same file using the Microsoft Excel Viewer everything is displayed correctly and if I use a character other than a new line (e.g. comma, #, etc) it displays fine in both also.

I have a workaround to go into the excel and replace the random character using a macro but would like to avoid that if possible as the process really needs to be completely automated. also because the file will be processed by an another internal tool, I do need these cells to be processed with a new line and I can't change the character.

I have also tried using chr(10) and/or chr(13) but in the former case it just get's replaced in the output by '\n' anyway as expected.

The code I'm currently using is similar to:

test_list = []
for x in range(1,18):
    test_list.append([
        "value1",
        "\n".join(['element1', 'element2', 'element3']),
        "value3"
    ])

data_df = pd.DataFrame(test_list)

fn = r'/path/to/excel/file.xlsx'

writer = pd.ExcelWriter(fn, engine='xlsxwriter')
data_df.to_excel(writer, sheet_name='Data', index=False, header=0)
workbook = writer.book
worksheet = writer.sheets['Data']
worksheet.set_column('A:ZZ',50,
                     workbook.add_format({'text_wrap': True}))
writer.save()

What happens with the Element data is that it shows in the OpenOffice Calc cell as something like:

Oddly the last item appears to be correct

The same data viewed as a list or via DataFrame.head() appears fine:

pprint(test_list)
[['value1', 'element1\nelement2\nelement3', 'value3'],
['value1', 'element1\nelement2\nelement3', 'value3'],
['value1', 'element1\nelement2\nelement3', 'value3'],
['value1', 'element1\nelement2\nelement3', 'value3'],
['value1', 'element1\nelement2\nelement3', 'value3'],
['value1', 'element1\nelement2\nelement3', 'value3'],
...
['value1', 'element1\nelement2\nelement3', 'value3']]

data_df.head(18):
         0                             1       2
0   value1  element1\nelement2\nelement3  value3
1   value1  element1\nelement2\nelement3  value3
2   value1  element1\nelement2\nelement3  value3
...
15  value1  element1\nelement2\nelement3  value3
16  value1  element1\nelement2\nelement3  value3

It's just when it get passed to the openpyxl library and viewed in OpenOffice.

Thanks

Solution

The code worked fine for me using OpenOffice 4.1.2 on Windows:

For this screenshot, I double-clicked on the bottom of the second row to expand it. Before that, it just showed element3 with a red triangle. But that seems different from the behavior you described.

EDIT:

Ok, I can now confirm the problem. As you said, it occurs with the mysterious number of 18 items. It looks like a bug in OpenOffice, because there is not much difference in the XML files viewed by unzipping file.xlsx.

I also tried adding CR and LF directly to the XML files, but this just resulted in:

That leaves us with three solutions:

Use LibreOffice instead, which does not have this problem (tested LO 5.1.0.3).
Report the bug and wait for a new version.
Use OpenOffice's preferred .ods format instead of MS Office's preferred format.