Search code examples
.netsqlitestoragehdf5

Please suggest data storage for single user application


I am looking for a data storage option for storing ECG (1000 samples /sec) and other patient data (e.g. blood pressure, body temperature etc - sampled at much lower sampling rate) in a queryable storage for my C# application.

I have already evaluated SQLite (which is a great option in its own right), but I am looking for some option that would meet following requirements:

  1. Small storage space - ECG is typically sampled at 1000 samples/sec, and I need to store ECG data for 24 - 48 hours (~82 to ~162 million data samples). On SQLite it is taking huge amount of space.

  2. I should be able quickly read a portion of this data (from - to timestamps).

  3. I should be able to modify portions of the data without having to write the all the data from that point onwards.

I have also looked at HDF5, but haven't really understood how to use it from C#.net.

Looking for practical suggestions.

Thanks,

Vikram


Solution

  • Your use case seems to be a perfect fit for HDF5.

    1. Small storage space - ECG is typically sampled at 1000 samples/sec, and I need to store ECG data for 24 - 48 hours (~82 to ~162 million data samples). On SQLite it is taking huge amount of space.

    HDF5 allows for very efficient and compact storage. Furthermore you can enable different compression algorithms/filters (gzip, bzip, etc) without too mich of a performance hit.

    1. I should be able quickly read a portion of this data (from - to timestamps).

    This is actually a main use case in HDF5. Slicing data from a dataset can easily be done very quickly.

    1. I should be able to modify portions of the data without having to write the all the data from that point onwards.

    It is possible to extend the dataset and also modify data in place (tough it's not as convenient as an UPDATE statement in SQLite). However there are some caveats regarding deleting data. (see here for more infos)

    If you have a lot of meta-information you can think of storing them in SQLite and connect those recods to HDF5 files which contain the raw data. Alternatively you can also store those meta-information as attributes on your nodes/datasets in HDF5 and avoid using SQLite alltogether.

    The only big problem/challenge regarding HDF5 is concurrent write operations. So if you have the requirement of concurrent write operations on a single HDF5 file it becomes more complex.

    For using HDF5 in .NET you can check out this thread.