Search code examples
jsonjsonschemaajv

Is there any way to define a scoping mechanism in JSON Schema for Arrays of Objects?


I would like to use JSON Schema to validate my data which exists as an array of objects. In this use-case, I have a list of people and I want to make sure they possess certain properties, but these properties aren't exhaustive.

For instance, if we have a person name Bob, I want to make sure that Bob's height, ethnicity and location is set to certain values. But I don't care much about Bob's other properties like hobbies, weight, relationshipStatus.

There is one caveat and it is that there can be multiple Bobs, so I don't want to check for all Bobs. It just so happens that each person has a unique ID given to them and I want to check properties of a person by the specified id.

Here is an example of all the people that exist:

{
  "people": [
    {
      "name": "Bob",
      "id": "ei75dO",
      "age": "36",
      "height": "68",
      "ethnicity": "american",
      "location": "san francisco",
      "weight": "174",
      "relationshipStatus": "married",
      "hobbies": ["camping", "traveling"]
    },
    {
      "name": "Leslie",
      "id": "UMZMA2",
      "age": "32",
      "height": "65",
      "ethnicity": "american",
      "location": "pawnee",
      "weight": "139",
      "relationshipStatus": "married",
      "hobbies": ["politics", "parks"]
    },
    {
      "name": "Kapil",
      "id": "HkfmKh",
      "age": "27",
      "height": "71",
      "ethnicity": "indian",
      "location": "mumbai",
      "weight": "166",
      "relationshipStatus": "single",
      "hobbies": ["tech", "games"]
    },
    {
      "name": "Arnaud",
      "id": "xSiIDj",
      "age": "42",
      "height": "70",
      "ethnicity": "french",
      "location": "paris",
      "weight": "183",
      "relationshipStatus": "married",
      "hobbies": ["cooking", "reading"]
    },
    {
        "name": "Kapil",
        "id": "fDnweF",
        "age": "38",
        "height": "67",
        "ethnicity": "indian",
        "location": "new delhi",
        "weight": "159",
        "relationshipStatus": "married",
        "hobbies": ["tech", "television"]
      },
    {
      "name": "Gary",
      "id": "ZX43NI",
      "age": "29",
      "height": "69",
      "ethnicity": "british",
      "location": "london",
      "weight": "172",
      "relationshipStatus": "single",
      "hobbies": ["parkour", "guns"]
    },
    {
      "name": "Jim",
      "id": "uLqbVe",
      "age": "26",
      "height": "72",
      "ethnicity": "american",
      "location": "scranton",
      "weight": "179",
      "relationshipStatus": "single",
      "hobbies": ["parkour", "guns"]
    }
  ]
}

And here is what I specifically want to check for in each person:

{
  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "type": "object",
  "properties": {
    "people": {
      "type": "array",
      "contains": {
        "anyOf": [
          {
            "type": "object",
            "properties": {
              "id": {
                "const": "ei75dO"
              },
              "name": {
                "const": "Bob"
              },
              "ethnicity": {
                "const": "american"
              },
              "location": {
                "const": "los angeles"
              },
              "height": {
                "const": "68"
              }
            },
            "required": ["id", "name", "ethnicity", "location", "height"]
          },
          {
            "type": "object",
            "properties": {
              "id": {
                "const": "fDnweF"
              },
              "name": {
                "const": "Kapil"
              },
              "location": {
                "const": "goa"
              },
              "height": {
                "const": "65"
              }
            },
            "required": ["id", "name", "location", "height"]
          },
          {
            "type": "object",
            "properties": {
              "id": {
                "const": "xSiIDj"
              },
              "name": {
                "const": "Arnaud"
              },
              "location": {
                "const": "paris"
              },
              "relationshipStatus": {
                "const": "single"
              }
            },
            "required": ["id", "name", "location", "relationshipStatus"]
          },
          {
            "type": "object",
            "properties": {
              "id": {
                "const": "uLqbVe"
              },
              "relationshipStatus": {
                "const": "married"
              }
            },
            "required": ["id", "relationshipStatus"]
          }
        ]
      }
    }
  },
  "required": ["people"]
}

Note that for Bob, I only want to check that his name in the records is Bob, his ethnicity is american and that his location and height are set properly.

For Kapil, notice that there are 2 of them in the record. I only want to validate the array object pertaining to Kapil with the id fDnweF.

And for Jim, I only want to make sure that his relationshipStatus is set to married.

So my question would be, is there any way in JSON Schema to say hey, when you come across and array of objects instead of running validation across each element in the data, only run it against objects that match a specific identifier. In our instance, we would say that the identifier is id. You can imagine that this identifier can be anything, for example it could have been socialSecurity# if the list of people were all from America.

The issue with the current schema is that when it tries to validate the objects, it generates a giant list of errors with no clear indication of which object failed with which value.

In an ideal scenario AJV (which I currently use) would generate errors that should look something like:

---------Bob-------------
path: people[0].location
expected: "los angeles"

// Notice how this isn't Kapil at index 2 since we provided the id which matches kapil at index 4
---------Kapil-----------
path: people[4].location
expected: "goa"

---------Kapil-----------
path: people[4].height
expected: "65"

---------Arnaud----------
path: people[3].relationshipStatus
expected: "single"

-----------Jim-----------
path: people[6].relationshipStatus
expected: "married"

Instead, currently AJV spits our errors with no clear indication of where the failure might be. If bob failed to match the expected value of location, it says that every person including bob has an invalid location, which from our perspective is incorrect.

How can I define a schema that can resolve this use-case and we can use JSON Schema to pinpoint which elements in our data aren't in compliance with what our schema states. All so that we can store these schema errors cleanly for reporting purposes and come back to these reports to see exactly which people (represented by index values of array) failed which values.

Edit: Assume that we would also like to check relatives for Bob as well. for instance we want to create a schema to check that their relative with the given ID ALSO is set to location: "los angeles" and another for "orange county".

{
    "people": [{
        "name": "Bob",
        "id": "ei75d0",
        "relationshipStatus": "married",
        "height": "68",
        "relatives": [
            {
                "name": "Tony",
                "id": "UDX5A6",
                "location": "los angeles",
              },
              {
                "name": "Lisa",
                "id": "WCX4AG",
                "location": "orange county",
              }
        ]
    }]
}

My question then would be, can the if/then/else be applied over to nested elements as well? I'm not having success but I'll continue trying to get it to work and will post an update here if/once I do.


Solution

  • How can I define a schema that can resolve this use-case and we can use JSON Schema to pinpoint which elements in our data aren't in compliance with what our schema states

    It's a little fiddly, but I've gone from "this isn't possible" to "you can just about do this.

    If you re-structure your schema to the following...

    {
      "$schema": "https://json-schema.org/draft/2019-09/schema",
      "type": "object",
      "properties": {
        "people": {
          "type": "array",
          "items": {
            "allOf":[
              {
                "if": {
                  "properties": {
                    "id": {
                      "const": "uLqbVe"
                    }
                  }
                },
                "then": {
                  "type": "object",
                  "properties": {
                    "id": {
                      "const": "uLqbVe"
                    },
                    "relationshipStatus": {
                      "const": "married"
                    }
                  },
                  "required": ["id", "relationshipStatus"]
                },
                "else": true
              }
            ]
          }
        }
      },
      "required": ["people"]
    }
    

    What we're doing here is, for each item in the array, if the object has the specific ID, then do the other validation, otherwise, it's valid.

    It's wrapped in an allOf so you can do the same pattern multiple times.

    The caveat is that, if you don't include all the IDs, or if you don't carefully check your schema, you will get told everything is valid.

    You should ideally, additionaly check that the IDs you are expecting, are actually there. (It's fine to do so in the same schema.)

    You can see this mostly working if you test it on https://jsonschema.dev by removing the $schema property. (This playground is only draft-07, but none of the keywords you use need anything above draft-07 anyway.)

    You can test this working on https://json-everything.net/json-schema which then gives you full validation response.

    AJV by default doesn't give you all the validaiton results. There's an option to enable it but I'm not in a position to test the result myself right now.