Search code examples
javascriptjsoncachingrecursionlarge-data

Caching a "deep" JSON object using GUIDs as keys


So I have a large, simple object that was loaded in from a JSON file in my JavaScript application.

This file has about 9 MB of data (should be lower once I minify it though) and is a nested structure like so:

{
    "guid": "guid 1 here",
    "children": [
        {
            "guid": "guid 2 here",
            "other": "properties",
            "here": true,
            "children": [
                {
                    "guid": "guid 3 here",
                    ...
                },
                ...
            ]
        },
        ...
    ]
}

I am not aware of the depth of this object and I need to use a generic function that locates a node based on its GUID property regardless of its depth in the tree. This recursive function (which I know could be optimized using a while loop instead of recursion, but regardless it is expensive) is slow.

I am wondering if initially after loading this object, I create a cache structure like this:

var cache = {
    "guid 1 here": [reference to object],
    "guid 2 here": [reference to object],
    "guid 3 here": [reference to object]
};

This, I would think, would make finding objects quicker, since I can just say

var node = cache[guid];

However, would this actually end up being a performance increase, or could this potentially cause memory problems? I've never dealt with a variable like cache, where there would potentially be hundreds of thousands of properties.

Would this help or hinder the situation?

Thanks for your advice as always, SO, you guys are amazing.


Solution

  • You have made the right choice.

    Object references are reasonably small. Even with thousands of objects such an object (which is stored internally as an array-like hash map) should not increase memory use significantly -- a few MB max. When you create your cache you are not making a copy of the objects. You are putting a pointer into the object (array style) that points to the object in the structure you've deserialized. So you aren't making a second copy of all the data. Just a cache of GUID and pointer.

    This cache method uses a great feature of JavaScript, which is that the property indices are kept sorted internally. Lookups by property index (e.g., cache[guid] ) are then performed with a binary search. That will be orders of magnitude faster than a loop or recursive search on unsorted data.