JsonPath with JsonTextReader: Token at a Time

I am having an issue with JsonPath working differently when loading token (.Load) at a time using JsonTextReader versus loading the entire JSON using ReadFrom. Here is an example: JSON: Path="[*].person" Method=SelectTokens(path)

 [
  {
    "person": {
      "personid": 123456
    }
  },
  {
    "person": {
      "personid": 798
    }
  }
]

When using .ReadFrom, it'll return the proper 2 elements. If I use .Load though, it'll return 0 elements. However, if I change the path to "person", .ReadFrom returns 0 elements while .Load returns 2 elements.

As a fix, I could change the path so that it'll remove up to the first "." i.e. path = substring(path.index(".")+1); however, this feels more of a hack than a proper fix. I would, of course, also need to ensure that the JSON is an array, but in most of my cases, it would be.

So finally, I am trying to learn how to use JSON Path with arrays when loading a token at a time. Any recommendations?

Full Code

Full JSON

Solution

What is happening in the code you have linked to is it reads tokens until it encounters an object, it then loads the a JToken from this object, which reads ahead to the end of this object. So what you end up with is a JToken per item in the root array. You can then for each JToken call:

token.SelectTokens("person").OfType<JObject>()

cause you know the property contains an object.

That is the equivalent of doing "[*].person" JsonPath on the whole parsed JSON.

I hope I have understood your question correctly. If not, please let me know =)

Update:

Based on your comments I understand what you are after. What you could do is create a method like this:

public IEnumerable<JToken> GetTokensByPath(TextReader tr, string path)
{
    // do our best to convert the path to a RegEx
    var regex = new Regex(path.Replace("[*]", @"\[[0-9]*\]"));
    using (var reader = new JsonTextReader(tr))
    {
        while (reader.Read())
        {
            if (regex.IsMatch(reader.Path))
                yield return JToken.Load(reader);
        }
    }
}

I am matching the path based on the JSON path input, but we need to try and handle all of the various JSON path grammars, at the moment I'm only support *. This approach will be useful when you have a massive file, with a deep JSON path selector, you'll keep the stream open longer if you enumerate slowly, but you will have a much lower peak memory usage.

I hope this helps.