Search code examples
c#jsonjson.netjsonschemajson-deserialization

Using a JSON Schema, how can I filter out additional properties when parsing JSON to a JObject?


I am trying to parse only part of provided JSON. I am using Newtonsoft.Json.Schema nuget. For the next example, I want to deserialize only name and age properties.

JSchema schema = JSchema.Parse(@"{
   'id': 'person',
   'type': 'object',
   'additionalProperties' : false,
   'properties': {
   'name': {'type':'string'},
   'age': {'type':'integer'}
   }
}");
            

JsonTextReader reader = new JsonTextReader(new StringReader(@"{
    'name': 'James',
    'age': 29,
    'salary': 9000.01,
    'jobTitle': 'Junior Vice President'
}"));

JSchemaValidatingReader validatingReader = new JSchemaValidatingReader(reader);
validatingReader.Schema = schema;

JsonSerializer serializer = new JsonSerializer();
JObject data = serializer.Deserialize<JObject>(validatingReader);

If I will set 'additionalProperties' : true I will get unnecessary fields deserialized.

enter image description here

But if I will set 'additionalProperties' : false, I will receive an error:

Newtonsoft.Json.Schema.JSchemaValidationException: Property 'salary' has not been defined and the schema does not allow additional properties. Path 'salary', line 4, position 11.

Note that I will know the needed fields only in runtime. I receive big JSON and I need to create some solution to deserialize only part of this JSON. And users should decide which properties should be processed and which aren't.


Solution

  • JSchemaValidatingReader Represents a reader that provides JSchema validation. It does not provide any sort of filtering capability.

    What you could do instead is to load your JSON into a JToken, validate with SchemaExtensions.IsValid(JToken, JSchema, out IList<ValidationError>), and then remove additional properties at the path indicated by ValidationError.Path.

    To do this, modify your code as follows:

    var data = JObject.Parse(jsonString); // The string literal from your question
    
    var isValid = data.IsValid(schema, out IList<ValidationError> errors);
    
    if (!isValid)
    {           
        foreach (var error in errors)
        {
            if (error.ErrorType == ErrorType.AdditionalProperties)
                data.SelectToken(error.Path)?.RemoveFromLowestPossibleParent();
        }
    }
    

    using the extension method:

    public static partial class JsonExtensions
    {
        public static JToken RemoveFromLowestPossibleParent(this JToken node)
        {
            if (node == null)
                return null;
            // If the parent is a JProperty, remove that instead of the token itself.
            var property = node.Parent as JProperty;
            var contained = property ?? node;
            if (contained.Parent != null)
                contained.Remove();
            // Also detach the node from its immediate containing property -- Remove() does not do this even though it seems like it should
            if (property != null)
                property.Value = null;
            return node;
        }
    }
    

    Notes:

    • When the error type is ErrorType.AdditionalProperties the Path will point directly to the unwanted property, but for other error types such as ErrorType.Required the path may point to the parent container. Thus you should check the error type before removing a token related to an error at a given path.

    • If your JSON is large, it is recommended to deserialize directly from a stream using JsonSerializer.CreateDefault().Deserialize<JToken>(reader) (as you are doing currently) to avoid loading the JSON into an intermediate string.

    Demo fiddle here.