Search code examples
pythonfilepoint-clouds

Does Python have a standard PTS reader or parser?


I have the following file:

version: 1
n_points:  68
{
55.866278 286.258077
54.784191 315.123248
62.148364 348.908294
83.264019 377.625584
102.690421 403.808995
125.495327 438.438668
140.698598 471.379089
158.435748 501.785631
184.471278 511.002579
225.857960 504.171628
264.555990 477.159805
298.168768 447.523374
332.502678 411.220089
350.641672 372.839985
355.004106 324.781552
349.265206 270.707703
338.314674 224.205227
33.431075 238.262266
42.204378 227.503948
53.939564 227.904931
68.298209 232.202002
82.271511 239.951519
129.480996 229.905585
157.960824 211.545631
189.465597 204.068108
220.288164 208.206246
249.905282 218.863196
110.089281 266.422557
108.368067 298.896910
105.018473 331.956957
102.889410 363.542719
101.713553 379.256535
114.636047 383.331785
129.543556 384.250352
140.033133 375.640569
152.523364 366.956846
60.326871 270.980865
67.198221 257.376350
92.335775 259.211865
102.394658 274.137548
86.227917 277.162353
68.397650 277.343621
165.340638 263.379230
173.385917 246.412765
198.024842 240.895985
223.488685 247.333206
207.218336 260.967007
184.619159 265.379884
122.903148 418.405102
114.539655 407.643816
123.642553 404.120397
136.821841 407.806210
149.926926 403.069590
196.680098 399.302500
221.946232 394.444167
203.262878 417.808844
164.318232 440.472370
145.915650 444.015386
136.436942 442.897031
125.273506 429.073840
124.666341 420.331816
130.710965 421.709666
141.438004 423.161457
155.870784 418.844649
213.410389 396.978046
155.870784 418.844649
141.438004 423.161457
130.710965 421.709666
}

The file extension is .pts.

Is there some standard reader for this file?

The code I have (downloaded from some github) which tries to read it is

landmark = np.loadtxt(image_landmarks_path)

which fails on

{ValueError}could not convert string to float: 'version:'

which makes sense.

I can't change the file, and wonder if i have to write my own parser or is this some standard?


Solution

  • It appears to be a 2D point cloud file, I think it's called the Landmark PTS format, the closest Python reference I could find is for a 3D-morphable face model-fitting library issue, which references a sample file that matches yours. Most .pts point cloud tools expect to work with 3D files so may not work out of the box with this one.

    So no, there doesn't appear to be a standard reader for this; the closest I came to a library that reads the format is this GitHub repository, but it has drawback: it reads all data into memory before manually parsing it into Python float values.

    However, the format is very simple (as the referenced issue notes), and so you can read the data just using numpy.loadtxt(); the simplistic approach is to just name all those non-data lines as comments:

    def read_pts(filename):
        return np.loadtxt(filename, comments=("version:", "n_points:", "{", "}"))
    

    or, if you are not sure about the validity of a bunch of such files and you'd want to ensure you only read valid files, then you could pre-process the file to read the header (including number of points and version validation, allowing for comments and image size info):

    from pathlib import Path
    from typing import Union
    import numpy as np
    
    def read_pts(filename: Union[str, bytes, Path]) -> np.ndarray:
        """Read a .PTS landmarks file into a numpy array"""
        with open(filename, 'rb') as f:
            # process the PTS header for n_rows and version information
            rows = version = None
            for line in f:
                if line.startswith(b"//"):  # comment line, skip
                    continue
                header, _, value = line.strip().partition(b':')
                if not value:
                    if header != b'{':
                        raise ValueError("Not a valid pts file")
                    if version != 1:
                        raise ValueError(f"Not a supported PTS version: {version}")
                    break
                try:
                    if header == b"n_points":
                        rows = int(value)
                    elif header == b"version":
                        version = float(value)  # version: 1 or version: 1.0
                    elif not header.startswith(b"image_size_"):
                        # returning the image_size_* data is left as an excercise
                        # for the reader.
                        raise ValueError
                except ValueError:
                    raise ValueError("Not a valid pts file")
    
            # if there was no n_points line, make sure the closing } line
            # is not going to trip up the numpy reader by marking it as a comment
            points = np.loadtxt(f, max_rows=rows, comments="}")
    
        if rows is not None and len(points) < rows:
            raise ValueError(f"Failed to load all {rows} points")
        return points
    

    That function is as production-ready as I can make it, apart from providing a full test suite.

    This uses the n_points: line to tell np.loadtxt() how many rows to read, and moves the file position forward to just past the { opener. It'll also exit with a ValueError if there is no version: 1 line present or if there is anything other than version: 1 and n_points: <int> in the header.

    Both produce a 68x2 matrix of float64 values but should be able to work with any dimension of points.

    Circling back to that EOS library reference, their demo code to read the data hand-parses the lines, also by reading all lines into memory first. I also found this Facebook Research PTS dataset loading code (for .pts files with 3 values per line), which is just as manual.