Search code examples
pythonexceptionpyyaml

How to get details from PyYAML exception?


I want to gracefully notify the user exactly where their mucked up YAML file is flawed. Line 288 of python-3.4.1/lib/python-3.4/yaml/scanner.py is where it reports a common parsing error and handles it by throwing an exception:

raise ScannerError("while scanning a simple key", key.mark,
                   "could not found expected ':'", self.get_mark())

I am struggling how to report it.

try:
    parsed_yaml = yaml.safe_load(txt)

except yaml.YAMLError as exc:
    print ("scanner error 1")
    if hasattr(exc, 'problem_mark'):
        mark = exc.problem_mark
        print("Error parsing Yaml file at line %s, column %s." %
                                            (mark.line, mark.column+1))
    else:
        print ("Something went wrong while parsing yaml file")
    return

This gives

$ yaml_parse.py
scanner error 1
Error parsing Yaml file line 1508, column 9.

But how do I get the error text and whatever is in key.mark and the other mark?

More usefully, how do I examine the PyYaml source to figure this out? The ScannerError class seems to ignore the parameters (from scanner.py line 32):

class ScannerError(MarkedYAMLError):
     pass

Solution

  • The ScannerError class has no methods defined (the pass statement work like a no-op. That makes it the same in functionality as its base class MarkedYAMLError and that is the one who stores the data. From error.py:

    class MarkedYAMLError(YAMLError):
        def __init__(self, context=None, context_mark=None,
                     problem=None, problem_mark=None, note=None):
            self.context = context
            self.context_mark = context_mark
            self.problem = problem
            self.problem_mark = problem_mark
            self.note = note
    
        def __str__(self):
            lines = []
            if self.context is not None:
                lines.append(self.context)
            if self.context_mark is not None  \
               and (self.problem is None or self.problem_mark is None
                    or self.context_mark.name != self.problem_mark.name
                    or self.context_mark.line != self.problem_mark.line
                    or self.context_mark.column != self.problem_mark.column):
                lines.append(str(self.context_mark))
            if self.problem is not None:
                lines.append(self.problem)
            if self.problem_mark is not None:
                lines.append(str(self.problem_mark))
            if self.note is not None:
                lines.append(self.note)
            return '\n'.join(lines)
    

    If you start with a file txt.yaml:

    hallo: 1
    bye
    

    and a test.py:

    import ruamel.yaml as yaml
    txt = open('txt.yaml')
    data = yaml.load(txt, yaml.SafeLoader)
    

    you will get the not so descriptive error:

    ...
    ruamel.yaml.scanner.ScannerError: while scanning a simple key
      in "txt.yaml", line 2, column 1
    could not find expected ':'
      in "txt.yaml", line 3, column 1
    

    However if you change the second line of test.py:

    import ruamel.yaml as yaml
    txt = open('txt.yaml').read()
    data = yaml.load(txt, yaml.SafeLoader)
    

    you get the more interesting error description:

    ...
    ruamel.yaml.scanner.ScannerError: while scanning a simple key
      in "<byte string>", line 2, column 1:
        bye
        ^
    could not find expected ':'
      in "<byte string>", line 3, column 1:
    
        ^
    

    This difference is because get_mark() (in reader.py) has more context to point to if it is not handling a stream:

    def get_mark(self):
        if self.stream is None:
            return Mark(self.name, self.index, self.line, self.column,
                        self.buffer, self.pointer)
        else:
            return Mark(self.name, self.index, self.line, self.column,
                        None, None)
    

    This data goes into the context_mark attribute. Look at that when you want to provide more context for the error. But as shown above that only works if you parse the YAML input from a buffer, not from a stream.

    Searching the YAML source is a hard task, all the methods of various classes are attached to either the Loader or the Dumper of which they are parent classes. The best help to trace this is using grep on def method_name(, as at least the method names are all distinctive (as they have to be for this to function).


    In the above I used my enhanced version of PyYAML called ruamel.yaml, for the purpose of this answer they should work the same.