Search code examples
c#jsonjson.net

Out of memory exception while loading large json file from disk


I have a 1.2 GB json file which when deserialized ought to give me a list with 15 mil objects.

The machine on which I'm trying to deserialize the same is a windows 2012 server(64 bit) with 16 core and 32 GB Ram.

The application has been built with target of x64.

Inspite of this when I try to read the json doc and convert it to list of objects I'm getting Out of memory exception. when I look at task manager I find that only 5GB memory has been used.

The codes i tried are as below..

a.

 string plays_json = File.ReadAllText("D:\\Hun\\enplays.json");

                plays = JsonConvert.DeserializeObject<List<playdata>>(plays_json);

b.

 string plays_json = "";
        using (var reader = new StreamReader("D:\\Hun\\enplays.json"))
        {
            plays_json = reader.ReadToEnd();
            plays = JsonConvert.DeserializeObject<List<playdata>>(plays_json);
        }

c.

 using (StreamReader sr = File.OpenText("D:\\Hun\\enplays.json"))
        {
            StringBuilder sb = new StringBuilder();
            sb.Append(sr.ReadToEnd());
            plays_json = sb.ToString();
            plays = JsonConvert.DeserializeObject<List<playdata>>(plays_json);
        }

All help is sincerely appreciated


Solution

  • The problem is that you are reading your entire huge file into memory and then trying to deserialize it all at once into a huge list. You should be using a StreamReader to process your file incrementally. Example (b) in your question doesn't cut it, even though you are using a StreamReader there, because you are still reading the entire file via ReadToEnd(). You should be doing something like this instead:

    using (StreamReader sr = new StreamReader("D:\\Hun\\enplays.json"))
    using (JsonTextReader reader = new JsonTextReader(sr))
    {
        var serializer = new JsonSerializer();
    
        while (reader.Read())
        {
            if (reader.TokenType == JsonToken.StartObject)
            {
                // Deserialize each object from the stream individually and process it
                var playdata = serializer.Deserialize<playdata>(reader);
    
                ProcessPlayData(playdata);
            }
        }
    }
    

    The ProcessPlayData method should process a single playdata object and then ideally write the result to a file or a database rather than an in-memory list (otherwise you may find yourself back in the same situation again). If you must store the results of processing each item into an in-memory list, then you might want to consider using a linked list or a similar structure that does not try to allocate memory in one contiguous block and does not need to reallocate and copy when it needs to expand.