I am using protobuf as a serializer to format data on disk. I may have a large set of protobuf object, say, millions of them. what is the best choice to layout them on disk? the protobuf objects will be read sequentially one by one or random accessed read by a external index.
I used to use lenghth(int)+protobuf_object+length(int).... format, but it failed if one of the protobuf happens to be dirty. and if many of the protobuf object are small, it may have some overhead.
If you only need sequential access, the easiest way to store multiple messages is to write the size of the object before it, as reccomended by the documentation: http://developers.google.com/protocol-buffers/docs/techniques#streaming
For example, you can create a class 'MessagesFile' with the following member functions to open, read and write your messages:
// File is opened using append mode and wrapped into
// a FileOutputStream and a CodedOutputStream
bool Open(const std::string& filename,
int buffer_size = kDefaultBufferSize) {
file_ = open(filename.c_str(),
O_WRONLY | O_APPEND | O_CREAT, // open mode
S_IREAD | S_IWRITE | S_IRGRP | S_IROTH | S_ISUID); //file permissions
if (file_ != -1) {
file_ostream_ = new FileOutputStream(file_, buffer_size);
ostream_ = new CodedOutputStream(file_ostream_);
return true;
} else {
return false;
}
}
// Code for append a new message
bool Serialize(const google::protobuf::Message& message) {
ostream_->WriteLittleEndian32(message.ByteSize());
return message.SerializeToCodedStream(ostream_);
}
// Code for reading a message using a FileInputStream
// wrapped into a CodedInputStream
bool Next(google::protobuf::Message *msg) {
google::protobuf::uint32 size;
bool has_next = istream_->ReadLittleEndian32(&size);
if(!has_next) {
return false;
} else {
CodedInputStream::Limit msgLimit = istream_->PushLimit(size);
if ( msg->ParseFromCodedStream(istream_) ) {
istream_->PopLimit(msgLimit);
return true;
}
return false;
}
}
Then, to write your messagges use:
MessagesFile file;
reader.Open("your_file.dat");
file.Serialize(your_message1);
file.Serialize(your_message2);
...
// close the file
To read all your messages:
MessagesFile reader;
reader.Open("your_file.dat");
MyMsg msg;
while( reader.Next(&msg) ) {
// user your message
}
...
// close the file