Search code examples
pythonopencvyamlpyyaml

Error while reading YAML file in python


I have a yaml file that looks like this:

%YAML 1.0
temp: !!opencv-matrix
 rows: 2
 cols: 23
 dt: f
 data: [ 3.35620789e+02, 3.64299591e+02, 3.95790131e+02,
   4.39863068e+02, 4.68664948e+02, 4.93518127e+02, 4.17159943e+02,
   4.21060364e+02, 3.99990234e+02, 4.17867157e+02, 4.34151215e+02,
   3.56201202e+02, 3.77741028e+02, 3.87051544e+02, 3.76879913e+02,
   4.42746796e+02, 4.52483917e+02, 4.73469604e+02, 4.52954742e+02,
   3.78402283e+02, 4.17679047e+02, 4.50588501e+02, 4.16388153e+02,
   9.05276794e+01, 9.21245193e+01, 1.02799362e+02, 9.93146744e+01,
   8.40704346e+01, 7.84236526e+01, 1.15820358e+02, 1.76747055e+02,
   1.61153061e+02, 1.68130676e+02, 1.58446228e+02, 1.07421455e+02,
   1.03407494e+02, 1.05380608e+02, 1.08374542e+02, 1.01048920e+02,
   9.76309204e+01, 9.83933716e+01, 1.02486870e+02, 1.71890350e+02,
   1.81417206e+02, 1.66303802e+02, 1.95539871e+02 ]

It is basically an opencv matrix and I have created the file in a c++ code. Now I want to read this file in python and I have this code:

import yaml
with open("reference_3d.yml") as fin:
     rfr = yaml.load(fin.read())

But when I run the code, it gives me this error:

Traceback (most recent call last):
File "scatter_plot.py", line 15, in <module>
rfr = yaml.load(fin.read())
File "/usr/local/lib/python2.7/site-packages/yaml/__init__.py", line 71, in  load
return loader.get_single_data()
File "/usr/local/lib/python2.7/site-packages/yaml/constructor.py", line 37, in get_single_data
node = self.get_single_node()
File "/usr/local/lib/python2.7/site-packages/yaml/composer.py", line 35, in  get_single_node
if not self.check_event(StreamEndEvent):
File "/usr/local/lib/python2.7/site-packages/yaml/parser.py", line 98, in check_event
self.current_event = self.state()
File "/usr/local/lib/python2.7/site-packages/yaml/parser.py", line 157, in parse_implicit_document_start
return self.parse_document_start()
File "/usr/local/lib/python2.7/site-packages/yaml/parser.py", line 174, in parse_document_start
self.peek_token().start_mark)
yaml.parser.ParserError: expected '<document start>', but found '<scalar>'
in "<string>", line 2, column 2:
 temp: !!opencv-matrix
 ^

Any idea how should I resolve this error?


Solution

  • The problem here is that your C++ generates (invalid) YAML 1.0 and that you try to parse that with a Python parser that can handle most of YAML 1.1.

    The YAML 1.0 specification doesn't have many examples, but the directive for documents states that the optional c-ns-directive should come in the document header after the document start (rules 48 and 49) and the specification clearly states that its form should be %YAML:1.0. Therefore a correct YAML 1.0 document should start with:

    ---
    %YAML:1.0
    

    However, even if this was output correctly PyYAML would not be able to read this older YAML version file.

    Since the latest YAML specification (1.2) is from 2009, the best thing you can do is to switch both your C++ program and your Python program to a 1.2 compatible library library. On yaml.org it is indicated that for C++ this would be yaml-cpp and for Python this would have to be ruamel.yaml (disclaimer: I am the author of that package).

    The YAML file will look like (yaml-cpp might change the tag and maybe even the rest of the dump):

    %YAML 1.2
    ---
    temp: !!opencv-matrix
     rows: 2
     cols: 23
     dt: f
     data: [ 3.35620789e+02, 3.64299591e+02, 3.95790131e+02,
       4.39863068e+02, 4.68664948e+02, 4.93518127e+02, 4.17159943e+02,
       4.21060364e+02, 3.99990234e+02, 4.17867157e+02, 4.34151215e+02,
       3.56201202e+02, 3.77741028e+02, 3.87051544e+02, 3.76879913e+02,
       4.42746796e+02, 4.52483917e+02, 4.73469604e+02, 4.52954742e+02,
       3.78402283e+02, 4.17679047e+02, 4.50588501e+02, 4.16388153e+02,
       9.05276794e+01, 9.21245193e+01, 1.02799362e+02, 9.93146744e+01,
       8.40704346e+01, 7.84236526e+01, 1.15820358e+02, 1.76747055e+02,
       1.61153061e+02, 1.68130676e+02, 1.58446228e+02, 1.07421455e+02,
       1.03407494e+02, 1.05380608e+02, 1.08374542e+02, 1.01048920e+02,
       9.76309204e+01, 9.83933716e+01, 1.02486870e+02, 1.71890350e+02,
       1.81417206e+02, 1.66303802e+02, 1.95539871e+02 ]
    

    And this you can do:

    from ruamel.yaml import YAML
    
    yaml = YAML()
    
    with open('reference_3d.yaml') as fin:
        rfr = yaml.load(fin)
    
    print(rfr['temp']['data'][rfr['temp']['cols']-1])
    

    to get the last value of the first data row ( 416.388153 )

    If you cannot, or are unwilling to change your C++ program, then just use ruamel.yaml and skip the first line of the YAML file:

    from ruamel.yaml import YAML
    
    yaml = YAML()
    
    with open('reference_3d.yaml') as fin:
        fin.readline()
        rfr = yaml.load(fin)
    
    print(rfr['temp']['data'][rfr['temp']['cols']-1])
    

    Please note that the recommended extension for files containing YAML documents has been .yaml since at least September 2006.