Search code examples
pythonvalidationcerberus

Python Cerberus - Validating Schema with this Example


I am using Cerberus to validate dataframes schema. Using this sample data and code below, the if-else statement should "data structure is valid", however it returns that the "data structure is not valid". Any insight would be appreciated.

import pandas as pd
from cerberus import Validator

df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'city': ['New York', 'Paris', 'London']
})

data = df.to_dict()

schema = {
    'name': {'type': 'string'},
    'age': {'type': 'integer', 'min': 18},
    'city': {'type': 'string'}
}

validator = Validator(schema)
is_valid = validator.validate(data)

if is_valid:
    print("Data structure is valid!")
else:
    print("Data structure is not valid.")
    print(validator.errors)

Which results:

>>> Data structure is not valid.
>>> {'age': ['must be of integer type'], 'city': ['must be of string type'], 'name': ['must be of string type']}

Solution

  • It is failing because df.to_dict() returns dictionary with values ad dictionaries

    data = df.to_dict()
    print(data)
    >>> {'name': {0: 'Alice', 1: 'Bob', 2: 'Charlie'}, 'age': {0: 25, 1: 30, 2: 35}, 'city': {0: 'New York', 1: 'Paris', 2: 'London'}}
    

    If you want your schema to validate this data you need to change it to:

    schema = {
        "name": {"type": "dict", "valuesrules": {"type": "string"}},
        "age": {"type": "dict", "valuesrules": {"type": "integer", "min": 18}},
        "city": {"type": "dict", "valuesrules": {"type": "string"}},
    }