I want to store an array of timestamps in a binary flat file. One of my requirements is that I can access individual timestamps later on for efficient query purposes without having to read and deserialize the entire array first (I use a binary search algorithm that finds the file position of a start timestamp and end timestamp which in turn determines which bytes are read and deserialized between those two timestamps because the entire binary file can be multiple gigabytes large in size).
Obviously, the simple but slow way is to use BitConverter.GetBytes(timestamp)
to convert each timestamp to bytes and to then store them in the file. I can then access each item individually in the file and use my custom binary search algorithm to find the timestamp that matches with the desired timestamp.
However, I found that BinaryFormatter is incredibly efficient (multiple times faster than protobuf-net and any other serializer I tried) regarding serialization/deserialization of value type arrays. Hence I attempted to try to serialize an array of timestamps into binary form. However, apparently that will now prevent me from accessing individual timestamps in the file without having to first deserialize the entire array.
Is there a way to still access individual items in binary form after having serialized an entire array of items via BinaryFormatter?
Here is some code snippet that demonstrates what I mean:
var sampleArray = new int[5] { 1,2,3,4,5};
var serializedSingleValueArray = sampleArray.SelectMany(x => BitConverter.GetBytes(x)).ToArray();
var serializedArrayofSingleValues = Serializers.BinarySerializeToArray(sampleArray);
var deserializesToCorrectValue = BitConverter.ToInt32(serializedSingleValueArray, 0); //value = 1 (ok)
var wrongDeserialization = BitConverter.ToInt32(serializedArrayofSingleValues, 0); //value = 256 (???)
Here the serialization function:
public static byte[]BinarySerializeToArray(object toSerialize)
{
using (var stream = new MemoryStream())
{
Formatter.Serialize(stream, toSerialize);
return stream.ToArray();
}
}
Edit: I do not need to concern myself with efficient memory consumption or file sizes as those are currently by far not the bottlenecks. It is the speed of serialization and deserialization that is the bottleneck for me with multi-gigabyte large binary files and hence very large arrays of primitives.
If your problem is just "how to convert an array of struct,to byte[]" you have other options than BitConverter. BitConverte
r is for single values, the Buffer
class is for arrays.
double[] d = new double[100];
d[4] = 1235;
d[8] = 5678;
byte[] b = new byte[800];
Buffer.BlockCopy(d, 0, b, 0, d.Length*sizeof(double));
// just to test it works
double[] d1 = new double[100];
Buffer.BlockCopy(b, 0, d1, 0, d.Length * sizeof(double));
This does a byte-level copy without converting anything and without iterating over items.
You can put this byte array directly to your stream (not a StreamWriter, not a Formatter)
stream.Write(b, 0, 800);
That's definitly the fastest way to write to a file,but it involves a complete copy, but probably also any other thinkable method, will read an item, store it first for some reason, before it goes to the file.
If this is the only thing you write to your file - you don't need to write the array-length in the file, you can use the file-length for this.
To read the 100th double value in the file:
file.Seek(100*sizeof(double), SeekOrigin.Begin);
byte[] tmp = new byte[8];
f.Read(tmp, 0, 8);
double value = BitConverter.ToDouble(tmp, 0);
Here, for single value, you can use BitConverter
.
This is the solution for .NET Framework, C# <= 7.0
For .NET Standard/.NET Core, C# 8.0 you have more options with Span<T>
, which gives you access to the internal memory, without copying the Data.