Search code examples
jsonperformancedictionaryparsingjavascript-objects

Vanilla JS (FUN ONE): Best way to access a deeply-buried JSON node from a string "map"


The Lead-Up

I just bumped into (a far less contrived version of) the scenario I laid out below. There's dozens of ways I can think of to handle it - everything from recursive functions to rendering it to the DOM and using a treewalker - and it wouldn't even surprise me to learn there's some obscure piece of JS esoterica that handles this precise problem, and that I simply don't know about.

Thing is, it's more of a noodler than you might think. Read on. I promise it'll hook you.

See, I've been a JS dev for 25 years now. Increasingly, part of my duties increasingly involve interviewing potential front-end devs. I like this problem enough to introduce it to my repertoire as one of my "see how the candidate thinks" interview questions. I'd really appreciate any input, and I'd love to see how folks would tackle this in the wild.

The Data Structure

Sooooo... let us say I have a massive (but valid) JSON object.

Let's call it n layers deep (say a relational database dump or the like. Use case is irrelevant here; it's the tactic I'm interested in), comprised of varying data types (e.g. array, object, string, number, boolean, whatever).

Sample Object

    const singledOut = {
    planets: [
        // Author's note: to distinguish these ellipses from the spread operator, I'll italicize 'em :)
        ...
        {"Venus": {...}}, 
        {"Earth": {
            ...
            "composition": [...],
            "continents":  [
                ...
                "Europe":     {...},
                "NorthAmerica"   {
                    ...
                    "climates":    [...],
                    "countries":   [
                        "Canada":        {...}   // ...and so on; you get the idea. Once we've drilled
                        "Mexico":        {...}   // far enough down to get good and granular, assume we'd
                        "UnitedStates":  {...}   // start seeing simple data types (string, numbers, etc)
                                ],
                    "currencies":  [...],            
                    ...
                            },
                "SouthAmerica"   {...},
                ...
            ],
            "craters":     [...]
            ...
        {"Mars":  {...}},
        ...
    ]
}

For purposes of this exercise, we may assume that we have - already loaded into memory - access to the JSON object in question (that is to say, some extremely-poorly-designed API vomited us back this big wankin' object that we need one trivial bit of data from) but we DO know the path to it!


The Locator

Now let us presuppose I have a dynamically-generated, delimited path in the form of a string to some node's "address" contain therein. Let's say we use tildes (~) as the delimiter, and let us also declare that the data source is free from those characters (or they're encoded), so as to preclude concerns over data pollution/need to cope with escape characters.

Sample "Path" to Some Deeply-Buried Data Tidbit

"planets~Earth~continents~NorthAmerica~countries~UnitedStates~states~California~ ... ~Disneyland~fauna~giant mice"

Assume the data we're after is quantity of giant mice inhabiting the Disneyland in Anaheim, CA, US, North America, Earth


The PUZZLE Bit

So really, there are four questions buried here.

  1. You don't need to answer them all, or, should you elect to,
  2. The same answer need not be applicable to all 4 scenarios (though if you can think of one I'll be impressed as hell).

SCENARIOS:

  • SCENARIO ONE: Laid out exactly as above. You have your object, you have your string path. How do we apply the string to the JSON to get the nugget we want?

  • SCENARIO TWO: Also the same as above, but this time we have only the VALUE portions of the KVP's: `Earth~NorthAmerica~UnitedStates~California~Orange~Anaheim~TouristTraps~Disneyland~ROUS's

  • SCENARIOS THREE AND FOUR: Same as ONE and TWO... but we're forced to start with that massive honkin' object as a STRING (JSON.stringify'd).

The objective in all 4 is to retrieve the data tidbit in the fewest steps/operations with the smallest memory footprint we can manage (beyond what's already been consumed, that is). In short, a recursive function that uses the JSON.Parse(JSON.stringify()) chicanery (which, while it would totally work, might not be the best call memory-wise with n levels of recursion, yanno?).

We may assume that the data we seek lives in its own unique node (e.g. it's not nestled inside some enumerated list buried inside said node; nothing like "1 dogs, 4 ducks 2 mice, 75 golden camels, 53 purple peacocks, 95 white Persian monkeys, 1 tiger... and Goofy"), but the possibility DOES exist that the data sought is pre-or suffixed in some way ("There are known to be 2", "2 giant mice", or "Disneyland shelters 2 big-ass rodents")

The tricky bit is even just getting the operation to turn the string version of "a~sample~path~like~this" into the KEY version of ["a"]["sample"]["path"]["like"]["this"] is frustratingly-inefficient in almost all of the solutions I've come up with.

So I figured what the hell: The best devs ON planet~Earth live on this site. I'll ask them. All that said, any form of JS voodoo is fair game, but if there's some oddball library out there that does this specifically, don't forget factor its weight and footprint into your answer's. Including all of, say ExtJS, just for this one problem is likely not a great solution as such things go.

I can't WAIT to see what this turns out! Besides, you never know: maybe you'll be my next interview!


Solution

  • I don't claim to be a javascript expert, much less "one of the best devs ON planet~Earth", but the answer to your question about converting "a~sample~path~like~this" into the KEY version of ["a"]["sample"]["path"]["like"]["this"] seems straight-forward enough:

    "a~sample~path~like~this".split('~').reduce((a,v)=>a[v], bigObject)
    

    That doesn't come close to solving your original question, where it is not sufficient to just index by a key. However, it could server as a basis, because all that is needed is to modify the reduction function to be a bit more general. But I find it a little difficult to answer because I don't understand what you intend by, for example,

    "countries": [
        "Canada":  {...}
        "Mexico":  {...}
        "UnitedStates":  {...}
    ]
    

    which is far from being a valid JSON (or Javascript) object, since arrays cannot have explicit keys (and elements of arrays need to be separated with commas). If you meant this:

    "countries": [
        { "Canada":  {...} },
        { "Mexico":  {...} },
        { "UnitedStates":  {...} },
    ]
    

    (which is, at best, an eccentric representation), then you could use ], "a~sample~path~like~this" .split('~') .reduce((a,v)=>a instanceof Array ? a.find(k=>k[v])[v] : a[v], bigObject)

    Alternatively, for this representation (which I have seen in the wild):

    "countries": [
        { "label": "Canada", ...},
        { "label": "Mexico", ...},
        { "label": "UnitedStates", ...},
    ]
    

    you could use something like:

    "a~sample~path~like~this"
       .split('~')
       .reduce((a,v)=>a instanceof Array ? a.find(k=>k.label==v) : a[v],
               bigObject)
    

    An even more sophisticated reduction function might involve trying a depth-first search through aggregates to see if there is a nested value with the desired key, or value, or whatever.

    For the case where the JSON object is still a string, the best option IMHO would be to use a streaming JSON parser (Google found me clarinet, which looks plausible). Precise implementation details would depend on the particularities of the JSON parser, but the basic idea would be to put the key sequence into something which can be indexed (probably using split as above to produce an Array), and then maintain an index into this sequence while the parser explores the object. That's not as easy as it sounds, I guess, but it's not too difficult either. The simple solution is to keep a stack of key indices. When the parser indicates that it is about to explore a sub-element, the current index is pushed onto the stack; when the parser indicates that it has finished exploring a sub-element, the stack is popped. When the parser indicates that a key has been encountered, and it matches the current target, the current index is incremented. That case also needs to be tested when the stack is popped; if the value popped off of the stack is the length of the key sequence, the value just finished is the desired result of the search.