Search code examples
reactjsjsonmemoryag-gridag-grid-react

Ag-grid load large amounts of data


For a bit of background: 1. I have never used ag-grid 2. I've taken over a project from another developer and it's using ag-grid to display data. The size of the decompressed data to display can be up to 1GB for certain reports and can crash the browser when the JSON data is loaded to the component.

On the ag-grid website there is a sample with 100,000 rows and 22 columns that is loading very fast and not causing any memory problems in my browser and I'm somewhat puzzled as to how this is done. This is the example:

https://www.ag-grid.com/example

I'm curious about which ways are available to deal with very large datasets and display them in the browser using ag-grid. I'm worried that there is a larger architectural problem, where the entire dataset should maybe not be loaded into the browser, and that maybe ag-grid is not an appropriate tool for data of this size. Any clarification or help would be greatly appreciated. Thanks!


Solution

  • So there are multiple angles to this type of performance problem.

    1. Rendering thousands of DOM nodes in a browser is slow. If you generate random data for 100,000 rows on the client side (i.e, simulating a situation where you don't need to download 1gb data model in the browser, since it's just random stuff generated in JS), it would still perform very badly in the browser without certain techniques, which ag-grid is using.

    2. Even if you've solved (1), 1GB is far too much data to keep in the memory of a single browser tab. So the data model itself would also cause a perf issue, never mind the rendering.

    (1) is solved by something that is called scroll virtualisation. There are lots of libraries for this like TankStack Virtual. ag-grid also implements its own version of this technique. Basically, these solve the issue by only rendering the rows that fit on the users' screen. Usually, browsers take up resources rendering rows the user can't see even when they are scrolled off the screen. But with virtualisation in place, as the user scrolls, it swaps out rows for new ones within the visible area.

    You can see this in action by using the DOM inspector to target the rows. Notice as you scroll up and down they are all being replaced dynamically.

    (2) in your case is a big issue. 1GB is exceptionally large and you could never hope to load all of that in one go. This is usually solved by some form of pagination or streaming. ag-grid alone won't help here, you'll need a back-end rethink as well. You would almost certainly need to store the data in a database and then your backend could paginate this out to the front end. Note "paginate" is an old term that has lived to this day. It doesn't mean you need Google-style page selection. It just means your server has an API that can get the data in chunks. The UI can still seamlessly fetch these as the user scrolls.

    So as the user scrolls, it downloads a small subset of the data on demand to facilitate the next batch of rows. Sometimes this is called "lazy loading". With the level of data you have, you'd probably also need to drop older pages from memory if it reached a certain amount of cached rows on the client side and get them again if the user scrolled back to that point. The reason for doing that would be to constrain the memory requirements of the tab, which could crash otherwise.

    I suspect the example on ag-grid website avoids this as it's generating the data synthetically on the client-side on-the-fly since it doesn't request any network data when switching between tables of different sizes. So they aren't dealing with (2) in that example. Though things like ag-grid can be used alongside that approach, and they likely provide examples of this too.

    Downloading a whole 1GB of data would never work, even ignoring the obvious usability problems of waiting many minutes for the page to load. It's too much memory. Your server has to serve this incrementally, on demand. That allows your client to request just the bits it needs at any given time, whilst staying within some reasonable memory requirements.

    If you have this, and some virtualisation to manage the rendering performance (like ag-grid or another library), it can be made to work in a performant way.

    So looping back to your original points, the reason it crashes is that ag-grid is being used in such a way that it only solves (1) and not (2). You need (2) as well for it to work without crashing with this level of data, and that requires:

    1. Back-end changes to be able to incrementally serve the data in chunks over your API, in ranges requested from the browser.
    2. Integration with the back-end API with ag-grid.