Search code examples
pythonpandasnumpybinarypoint-clouds

writing binary data with tobytes() can not be read with software on Windows


I am trying to write some xyz point data to a .ply file using python.

I am using this script here which basically writes a pandas DataFrame to a binary format via a recarry and numpy method tobytes():

import pandas as pd
import numpy as np

pc = pd.read_csv('points.txt')

with open('some_file.ply', 'w') as ply:

    ply.write("ply\n")
    ply.write('format binary_little_endian 1.0\n')
    ply.write("comment Author: Phil Wilkes\n")
    ply.write("obj_info generated with pcd2ply.py\n")
    ply.write("element vertex {}\n".format(len(pc)))
    ply.write("property float x\n")
    ply.write("property float y\n")
    ply.write("property float z\n")
    ply.write("end_header\n")

    pc[['x', 'y', 'z']] = pc[['x', 'y', 'z']].astype('f4')

    ply.write(pc[['x', 'y', 'z']].to_records(index=False).tobytes())

This script works fine on my Mac and software like CloudCompare can read it; however, when I use the same script on a windows machine CloudCompare can read the header info but garbels the binary content.

When I read the a text file version into CloudCompare and output as a binary file both Linux and Windows versions can read it but the file contents are different.

Here is the version that is produced by the above script, here is the version produce by CloudCompare on Windows and here is the raw data.


Solution

  • Turns out I needed to specify what line ending to use when opening the file:

    open(output_name, 'w', newline='\n')
    

    After rewriting for Python 3 the file has to be written to twice - once for the header and once for the binary component so the new function looks like:

    import pandas as pd
    import numpy as np
    
    pc = pd.read_csv('points.txt')
    
    with open(output_name, 'w', newline='\n') as ply:
    
        ply.write("ply\n")
        ply.write('format binary_little_endian 1.0\n')
        ply.write("comment Author: Phil Wilkes\n")
        ply.write("obj_info generated with pcd2ply.py\n")
        ply.write("element vertex {}\n".format(len(pc)))
        ply.write("property float x\n")
        ply.write("property float y\n")
        ply.write("property float z\n")
        ply.write("end_header\n")
    
    with open(output_name, 'ab') as ply:
        pc[['x', 'y', 'z']] = pc[['x', 'y', 'z']].astype('f4')
        ply.write(pc[cols].to_records(index=False).tobytes())