Search code examples
pythonpandasjsonschemajson-schema-validator

JSON Schema Validation of a Decimal Number in a Panda Dataframe


The following Python script must validate the number of decimal places in the records. In the schema, I am trying to define that it has 3 decimal places, using "multipleOf": 0.001.

I have a record with 5 decimal places: "scores": [1.12345]

It should report an error but it is returning:

Validation ok
     scores
0 1.12345

How can I fix this?

import jsonschema
import pandas as pd

schema = {
    "type": "array",
    "properties": {"scores": {"type": "number", "multipleOf": 0.001}},
}


df = pd.DataFrame(
    {
        "scores": [1.12345],
    }
)

validator = jsonschema.Draft202012Validator(schema)

try:
    validator.validate(instance=df.to_dict("records"))
    print("Validation ok")

except jsonschema.ValidationError as e:
    print(f"Validation error: {e.message}")

print(df)

Solution

  • Your validation is not successful because the type:array is defined incorrectly. Your scores array instance is never validated because the schema only knows about "scores": 1.000 OR [] at the root.

    EDIT: I didn't see you are using dataframes which are an array instance

    Try this schema

    schema = {
        "type": "array",
        "items": {
          "type": "object",
          "properties": { "scores": {
             "type": "array", "items": {"type": "number", "multipleOf": 0.001 }}}}},
    }
    

    This would be equivalent to

    [{"scores": [1.000, 1.235, 2.333]}]