I have a project where I have to store 16 objects, each containing a list of 185 000 double
's. Overall size of the saved object should be around 20-30 mb (sizeof(double) * 16 * 185 000
), but when I try to retrieve it from database, the database allocates 200 mb to retrieve this 20-30 mb object.
My questions are:
Here is fully replicable example and screenshots of profiler:
class Program
{
private static string _path;
static void Main(string[] args)
{
_path = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "testDb");
// Comment after first insert to avoid adding the same object.
AddData();
var data = GetData();
Console.ReadLine();
}
public static void AddData()
{
var items = new List<Item>();
for (var index = 0; index < 16; index++)
{
var item = new Item {Values = Enumerable.Range(0, 185_000).Select(v => (double) v).ToList()};
items.Add(item);
}
var testData = new TestClass { Name = "Test1", Items = items.ToList() };
using (var db = new LiteDatabase(_path))
{
var collection = db.GetCollection<TestClass>();
collection.Insert(testData);
}
}
public static TestClass GetData()
{
using (var db = new LiteDatabase(_path))
{
var collection = db.GetCollection<TestClass>();
// This line causes huge memory allocation and wakes up garbage collector many many times.
return collection.FindOne(Query.EQ(nameof(TestClass.Name), "Test1"));
}
}
}
public class TestClass
{
public int Id { get; set; }
public string Name { get; set; }
public IList<Item> Items { get; set; }
}
public class Item
{
public IList<double> Values { get; set; }
}
Changing 185_000
to 1_850_000
makes my RAM usage go to >4GB(!)
There are several reasons in LiteDB to allocate much more memory than direct List<Double>
.
To understand this, you need know that your typed class are converted into a BsonDocument
structure (with BsonValues
). This structure has an overhead (+1 or +5 bytes per BsonValue
).
Also, to serialize this class (when you insert), LiteDB must create one single byte[]
with all this BsonDocument
(in BSON format). After, this super large byte[]
are copied to many extend pages (each page contains a byte[4070]
).
Not only this, also LiteDB must keep track original data to store in journal area. So, this size can be doubled.
To deserialize, LiteDB must do inverse process: read all pages from disk to memory, join all pages into a single byte[]
, deserialize into BsonDocument
to finish map to your class.
This operations, for small objects, are ok. This memory are reused for each new document read/write so memory keeps in control.
In next v5 version this process has some optimizations, like:
byte[]
to read document. This can be done using new ChunkStream(IEnumerable<byte[]>)
. Serialization still need this single byte[]
ExtendPage
are not stored in cache anymoreFor future versions I thinking in use new Span<T>
class to re-use previous memory allocations. But I need study more about this.
But, store a single document with 185,000 values are best solution in any nosql database. MongoDB limit BSON document size in 16Mb (and early versions was ~368kb limit)... I limited LiteDB to 1Mb in v2... but I remove this check size and just add as recommendation to avoid large single documents.
Try split your class into 2 collections: one for your data and another for each value. You can also split this large array into chunks, like LiteDB FileStorage or MongoDB GridFS.