python python-3.x binaryfiles mesh pyparsing

Parsing binary Stanford polygon files (PLY) with Pyparsing

For a larger project, I'm currently in the process of writing a Stanford polygon file (PLY) parser. The example at Github Gists is currently capable of parsing ASCII-format PLY files into a data abstraction Mesh. It also contains a description of the actual grammar, for those inclined.

However the format definition (PLY - Polygon File Format) also includes two binary formats (little and big endian). Since those two formats are much more common (and storage-space efficient), I would like to be able to parse those files with pyparsing as well.

I'm grateful for some advice on how to do that, if at all possible.

The idea of the binary PLY files is that, the header portion consists of an ASCII description of the actual data of the file, and the body contains the actual data. An example (data in brackets are hex bytes):

ply
format binary_little_endian 1.0          
element vertex 1
property float x
property float y
property float z
property uchar red
property uchar green
property uchar blue
property uchar alpha
end_header
[84 72 F1 C1 D8 FD 9F C1 00 00 00 00 3B 45 CB FF]

My first approach was to just load the input file in binary format (using bytes instead of str), and adapt the parser accordingly, but this somehow throws pyparsing off track. Also, I don't really know how to tell pyparsing how to grok byte groups.

  File "components.py", line 338, in create
    mesh = PlyParser.create().load(mesh_path)
  File "model_parser.py", line 120, in create
    property_position = aggregate_property("position", b"x", b"y", b"z")
  File "model_parser.py", line 113, in aggregate_property
    aggregates.append(pp.Group(property_simple_prefix + keyword_or(*keywords)("name")))
  File "model_parser.py", line 87, in keyword_or
    return pp.Or(pp.CaselessKeyword(literal) for literal in keywords)
  File "pyparsing.py", line 3418, in __init__
    super(Or,self).__init__(exprs, savelist)
  File "pyparsing.py", line 3222, in __init__
    exprs = list(exprs)
  File "model_parser.py", line 87, in <genexpr>
    return pp.Or(pp.CaselessKeyword(literal) for literal in keywords)
  File "pyparsing.py", line 2496, in __init__
    super(CaselessKeyword,self).__init__( matchString, identChars, caseless=True )
  File "pyparsing.py", line 2422, in __init__
    self.matchLen = len(matchString)
TypeError: object of type 'int' has no len()

Solution

What you might want to try is to open the file as text, use pyparsing to parse the header and capture the end position of the "end header" token. Use the structure information extracted from the header to build a Python struct reader that will process the binary content. Then reopen the file as binary, seek to the position, and use the struct reader to load the binary content. Probably simpler than twisting pyparsing to be both text and binary.