Search code examples
c#performanceoptimizationserializationlookup-tables

Storing Large Lookup Tables


I am developing an app that utilizes very large lookup tables to speed up mathematical computations. The largest of these tables is an int[] that has ~10 million entries. Not all of the lookup tables are int[]. For example, one is a Dictionary with ~200,000 entries. Currently, I generate each lookup table once (which takes several minutes) and serialize it to disk (with compression) using the following snippet:

    int[] lut = GenerateLUT();
    lut.Serialize("lut");

where Serialize is defined as follows:

    public static void Serialize(this object obj, string file)
    {
        using (FileStream stream = File.Open(file, FileMode.Create))
        {
            using (var gz = new GZipStream(stream, CompressionMode.Compress))
            {
                var formatter = new BinaryFormatter();
                formatter.Serialize(gz, obj);
            }
        }
    }

The annoyance I am having is when launching the application, is that the Deserialization of these lookup tables is taking very long (upwards of 15 seconds). This type of delay will annoy users as the app will be unusable until all the lookup tables are loaded. Currently the Deserialization is as follows:

     int[] lut1 = (Dictionary<string, int>) Deserialize("lut1");
     int[] lut2 = (int[]) Deserialize("lut2");
 ...

where Deserialize is defined as:

    public static object Deserialize(string file)
    {
        using (FileStream stream = File.Open(file, FileMode.Open))
        {
            using (var gz = new GZipStream(stream, CompressionMode.Decompress))
            {
                var formatter = new BinaryFormatter();
                return formatter.Deserialize(gz);
            }
        }
    }

At first, I thought it might have been the gzip compression that was causing the slowdown, but removing it only skimmed a few hundred milliseconds from the Serialization/Deserialization routines.

Can anyone suggest a way of speeding up the load times of these lookup tables upon the app's initial startup?


Solution

  • First, deserializing in a background thread will prevent the app from "hanging" while this happens. That alone may be enough to take care of your problem.

    However, Serialization and deserialization (especially of large dictionaries) tends to be very slow, in general. Depending on the data structure, writing your own serialization code can dramatically speed this up, particularly if there are no shared references in the data structures.

    That being said, depending on the usage pattern of this, a database might be a better approach. You could always make something that was more database oriented, and build the lookup table in a lazy fashion from the DB (ie: a lookup is lookup in the LUT, but if the lookup doesn't exist, load it from the DB and save it in the table). This would make startup instantaneous (at least in terms of the LUT), and probably still keep lookups fairly snappy.