Search code examples
pythonvalidationyaml

How to get the error line number of yaml validation using cerberus?


I am using cerberus for validating my yaml file against predefined schema as shown below

import yaml

schema_text = '''
name:
type: string
age:
type: integer
min: 10
'''

input_text = '''
name: Little Joe                     *(Line 1)*
age: 5                               *(Line 2)*
'''

schema_yaml = yaml.load(schema_text)
input_yaml = yaml.load(input_text)
v.validate(input_yaml , schema_yaml)

v.errors
**{'age': ['min value is 10']}**

When handling YAML validation errors, instead of just displaying the error message to the user, it would be super helpful to display the line number(s) of the validation error as well so the user can figure out what's going on.

  • For example :

{'age': ['min value is 10..error found at line number 2']}

Is there such option available in cerberus ? Any leads would be much helpful.


Solution

  • There are multiple things you should be aware of. First of all your schema_yaml is invalid YAML as all keys for a single mapping need to be unique. PyYAML will however happily load that overwriting string with integer. You actually want to get an error message and detect you should indent some of the lines in schema_yaml. You should also make it a habit to add a backslash after the opening triple-quotes, otherwise your string starts with an empty line and your counting of line numbers will be off by one.

    Using ruamel.yaml (disclaimer: I am the author of that package) you can keep track of the lines a key was assigned to during the creation of the mapping. The start_mark of the key node has the line number (starting at 0):

    import sys
    import cerberus
    import ruamel.yaml
    
    schema_text = '''\
    name:
      type: string
    age:
      type: integer
      min: 10
    '''
    
    input_text = '''\
    name: Little Joe                  #   *(Line 1)*
    age: 5                            #   *(Line 2)*
    '''
    
    yaml = ruamel.yaml.YAML(typ='safe') # no need for linenumbers in the schema
    schema = yaml.load(schema_text)
    v = cerberus.Validator()
    yaml = ruamel.yaml.YAML()
    
    def my_construct_mapping(self, node, maptyp, deep=False):
        if not isinstance(node, ruamel.yaml.nodes.MappingNode):
            raise ruamel.yaml.constructor.ConstructorError(
                None, None, f'expected a mapping node, but found {node.id!s}', node.start_mark,
            )
        total_mapping = maptyp
        if getattr(node, 'merge', None) is not None:
            todo = [(node.merge, False), (node.value, False)]
        else:
            todo = [(node.value, True)]
        for values, check in todo:
            mapping = self.yaml_base_dict_type()
            for key_node, value_node in values:
                # keys can be list -> deep
                key = self.construct_object(key_node, deep=True)
                # lists are not hashable, but tuples are
                if not isinstance(key, Hashable):
                    if isinstance(key, list):
                        key = tuple(key)
                if not isinstance(key, Hashable):
                    raise ruamel.yaml.constructor.ConstructorError(
                        'while constructing a mapping',
                        node.start_mark,
                        'found unhashable key',
                        key_node.start_mark,
                    )
                value = self.construct_object(value_node, deep=deep)
                if check:
                    if self.check_mapping_key(node, key_node, mapping, key, value):
                        mapping[key] = value
                else:
                    mapping[key] = value
                if not hasattr(self.loader, 'keyline'):
                    self.loader.keyline = {}
                self.loader.keyline[key] = key_node.start_mark.line + 1  # ruamel.yaml start line-count at 0
            total_mapping.update(mapping)
        return total_mapping
    
    yaml.Constructor.construct_mapping = my_construct_mapping
    
    
    data = yaml.load(input_text)
    v.validate(data, schema)
    for key, val in v.errors.items():
        print(f'error for key "{key}" at line {yaml.keyline[key]}: {"".join(val)}')
    

    which gives:

    error for key "age" at line 2: min value is 10