I am using Cerberus to validate dataframes schema. Using this sample data and code below, the if-else statement should "data structure is valid", however it returns that the "data structure is not valid". Any insight would be appreciated.
import pandas as pd
from cerberus import Validator
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'city': ['New York', 'Paris', 'London']
})
data = df.to_dict()
schema = {
'name': {'type': 'string'},
'age': {'type': 'integer', 'min': 18},
'city': {'type': 'string'}
}
validator = Validator(schema)
is_valid = validator.validate(data)
if is_valid:
print("Data structure is valid!")
else:
print("Data structure is not valid.")
print(validator.errors)
Which results:
>>> Data structure is not valid.
>>> {'age': ['must be of integer type'], 'city': ['must be of string type'], 'name': ['must be of string type']}
It is failing because df.to_dict() returns dictionary with values ad dictionaries
data = df.to_dict()
print(data)
>>> {'name': {0: 'Alice', 1: 'Bob', 2: 'Charlie'}, 'age': {0: 25, 1: 30, 2: 35}, 'city': {0: 'New York', 1: 'Paris', 2: 'London'}}
If you want your schema to validate this data you need to change it to:
schema = {
"name": {"type": "dict", "valuesrules": {"type": "string"}},
"age": {"type": "dict", "valuesrules": {"type": "integer", "min": 18}},
"city": {"type": "dict", "valuesrules": {"type": "string"}},
}