Search code examples
c#litedb

High memory usage when retrieving values from database


I have a project where I have to store 16 objects, each containing a list of 185 000 double's. Overall size of the saved object should be around 20-30 mb (sizeof(double) * 16 * 185 000), but when I try to retrieve it from database, the database allocates 200 mb to retrieve this 20-30 mb object.

My questions are:

  1. Is this expected behaviour?
  2. How can I avoid such huge allocation of memory when I just want to retrieve one document?

Here is fully replicable example and screenshots of profiler:

class Program
{
    private static string _path;

    static void Main(string[] args)
    {
        _path = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "testDb");

        // Comment after first insert to avoid adding the same object.
        AddData();

        var data = GetData();

        Console.ReadLine();
    }

    public static void AddData()
    {
        var items = new List<Item>();
        for (var index = 0; index < 16; index++)
        {
            var item = new Item {Values = Enumerable.Range(0, 185_000).Select(v => (double) v).ToList()};
            items.Add(item);
        }
        var testData = new TestClass { Name = "Test1", Items = items.ToList() };

        using (var db = new LiteDatabase(_path))
        {
            var collection = db.GetCollection<TestClass>();
            collection.Insert(testData);
        }
    }

    public static TestClass GetData()
    {
        using (var db = new LiteDatabase(_path))
        {
            var collection = db.GetCollection<TestClass>();
            // This line causes huge memory allocation and wakes up garbage collector many many times.
            return collection.FindOne(Query.EQ(nameof(TestClass.Name), "Test1"));
        }
    }
}

public class TestClass
{
    public int Id { get; set; }
    public string Name { get; set; }
    public IList<Item> Items { get; set; }
}

public class Item
{
    public IList<double> Values { get; set; }
}

Changing 185_000 to 1_850_000 makes my RAM usage go to >4GB(!)

Profiler: Profiler image


Solution

  • There are several reasons in LiteDB to allocate much more memory than direct List<Double>.

    To understand this, you need know that your typed class are converted into a BsonDocument structure (with BsonValues). This structure has an overhead (+1 or +5 bytes per BsonValue).

    Also, to serialize this class (when you insert), LiteDB must create one single byte[] with all this BsonDocument (in BSON format). After, this super large byte[] are copied to many extend pages (each page contains a byte[4070]).

    Not only this, also LiteDB must keep track original data to store in journal area. So, this size can be doubled.

    To deserialize, LiteDB must do inverse process: read all pages from disk to memory, join all pages into a single byte[], deserialize into BsonDocument to finish map to your class.

    This operations, for small objects, are ok. This memory are reused for each new document read/write so memory keeps in control.

    In next v5 version this process has some optimizations, like:

    • Deserialize do not need allocated all data into a single byte[] to read document. This can be done using new ChunkStream(IEnumerable<byte[]>). Serialization still need this single byte[]
    • Journal file was changed to WAL (write ahead log) - don't need keep original data.
    • ExtendPage are not stored in cache anymore

    For future versions I thinking in use new Span<T> class to re-use previous memory allocations. But I need study more about this.


    But, store a single document with 185,000 values are best solution in any nosql database. MongoDB limit BSON document size in 16Mb (and early versions was ~368kb limit)... I limited LiteDB to 1Mb in v2... but I remove this check size and just add as recommendation to avoid large single documents.

    Try split your class into 2 collections: one for your data and another for each value. You can also split this large array into chunks, like LiteDB FileStorage or MongoDB GridFS.