I'm currently working on a small project which requires loading messages from a file. The messages are stored sequentially in the file and files can become huge, so loading the entire file content into memory is unrewarding.
Therefore we decided to implement a FileReader
class that is capable of moving to specific elements in the file quickly and load them on request. Commonly used something along the following lines
SpecificMessage m;
FileReader fr;
fr.open("file.bin");
fr.moveTo(120); // Move to Message #120
fr.read(&m); // Try deserializing as SpecificMessage
The FileReader per se works great. Therefore we thought about adding STL compliant iterator support as well: A random access iterator that provides read-only references to specific messages. Used in the following way
for (auto iter = fr.begin<SpecificMessage>(); iter != fr.end<SpecificMessage>(); ++iter) {
// ...
}
Remark: the above assumes that the file only contains messages of type SpecificMessage. We've been using boost::iterator_facade
to simplify the implementation.
Now my question boils down to: how to implement the iterator correctly? Since FileReader
does not actually hold a sequence of messages internally, but loads them on request.
What we've tried so far:
Storing the message as an iterator member
This approach stores the message in the iterator instance. Which works great for simple use-cases but fails for more complex uses. E.g. std::reverse_iterator
has a dereference operation that looks like this
reference operator*() const
{ // return designated value
_RanIt _Tmp = current;
return (*--_Tmp);
}
This breaks our approach as a reference to a message from a temporary iterator is returned.
Making the reference type equal the value type
@DDrmmr in the comments suggested making the reference type equal the value type, so that a copy of the internally stored object is returned. However, I think this is not valid for the reverse iterator which implements the -> operator as
pointer operator->() const {
return (&**this);
}
which derefs itself, calls the *operator which then returns a copy of a temporary and finally returns the address of this temporary.
Storing the message externally
Alternatively I though about storing the message externally:
SpecificMessage m;
auto iter = fr.begin<SpecificMessage>(&m);
// ...
which also seems to be flawed for
auto iter2 = iter + 2
which will have both iter2
and iter
point to the same content.
As I hinted in my other answer, you could consider using memory mapped files. In the comment you asked:
As far as memory mapped files is concerned, this seems not what I want to have, as how would you provide an iterator over SpecificMessages for them?
Well, if your SpecificMessage is a POD type, you could just iterate over the raw memory directly. If not, you could have a deserialization helper (as you already have) and use Boost transform_iterator
to do the deserialization on demand.
Note that we can make the memory mapped file managed, effectively meaning that you can just use it as a regular heap, and you can store all standard containers. This includes node-based containers (map<>
, e.g.), dynamic-size containers (e.g. vector<>
) in addition to the fixed-size containers (array<>
) - and any combinations of those.
Here's a demo that takes a simple SpecificMessage
that contains a string, and (de)derializes it directly into shared memory:
using blob_t = shm::vector<uint8_t>;
using shared_blobs = shm::vector<blob_t>;
The part that interests you would be the consuming part:
bip::managed_mapped_file mmf(bip::open_only, DBASE_FNAME);
shared_blobs* table = mmf.find_or_construct<shared_blobs>("blob_table")(mmf.get_segment_manager());
using It = boost::transform_iterator<LazyLoader<SpecificMessage>, shared_blobs::const_reverse_iterator>;
// for fun, let's reverse the blobs
for (It first(table->rbegin()), last(table->rend()); first < last; first+=13)
std::cout << "blob: '" << first->contents << "'\n";
// any kind of random access is okay, though:
auto random = rand() % table->size();
SpecificMessage msg;
load(table->at(random), msg);
std::cout << "Random blob #" << random << ": '" << msg.contents << "'\n";
So this prints each 13th message, in reverse order, followed by a random blob.
The sample online uses the lines of the sources as "messages".
#include <boost/interprocess/file_mapping.hpp>
#include <boost/interprocess/managed_mapped_file.hpp>
#include <boost/container/scoped_allocator.hpp>
#include <boost/interprocess/containers/vector.hpp>
#include <iostream>
#include <boost/iterator/transform_iterator.hpp>
#include <boost/range/iterator_range.hpp>
static char const* DBASE_FNAME = "database.map";
namespace bip = boost::interprocess;
namespace shm {
using segment_manager = bip::managed_mapped_file::segment_manager;
template <typename T> using allocator = boost::container::scoped_allocator_adaptor<bip::allocator<T, segment_manager> >;
template <typename T> using vector = bip::vector<T, allocator<T> >;
}
using blob_t = shm::vector<uint8_t>;
using shared_blobs = shm::vector<blob_t>;
struct SpecificMessage {
// for demonstration purposes, just a string; could be anything serialized
std::string contents;
// trivial save/load serialization code:
template <typename Blob>
friend bool save(Blob& blob, SpecificMessage const& msg) {
blob.assign(msg.contents.begin(), msg.contents.end());
return true;
}
template <typename Blob>
friend bool load(Blob const& blob, SpecificMessage& msg) {
msg.contents.assign(blob.begin(), blob.end());
return true;
}
};
template <typename Message> struct LazyLoader {
using type = Message;
Message operator()(blob_t const& blob) const {
Message result;
if (!load(blob, result)) throw std::bad_cast(); // TODO custom excepion
return result;
}
};
///////
// for demo, create some database contents
void create_database_file() {
bip::file_mapping::remove(DBASE_FNAME);
bip::managed_mapped_file mmf(bip::open_or_create, DBASE_FNAME, 1ul<<20); // Even sparse file size is limited on Coliru
shared_blobs* table = mmf.find_or_construct<shared_blobs>("blob_table")(mmf.get_segment_manager());
std::ifstream ifs("main.cpp");
std::string line;
while (std::getline(ifs, line)) {
table->emplace_back();
save(table->back(), SpecificMessage { line });
}
std::cout << "Created blob table consisting of " << table->size() << " blobs\n";
}
///////
void display_random_messages() {
bip::managed_mapped_file mmf(bip::open_only, DBASE_FNAME);
shared_blobs* table = mmf.find_or_construct<shared_blobs>("blob_table")(mmf.get_segment_manager());
using It = boost::transform_iterator<LazyLoader<SpecificMessage>, shared_blobs::const_reverse_iterator>;
// for fun, let's reverse the blobs
for (It first(table->rbegin()), last(table->rend()); first < last; first+=13)
std::cout << "blob: '" << first->contents << "'\n";
// any kind of random access is okay, though:
auto random = rand() % table->size();
SpecificMessage msg;
load(table->at(random), msg);
std::cout << "Random blob #" << random << ": '" << msg.contents << "'\n";
}
int main()
{
#ifndef CONSUMER_ONLY
create_database_file();
#endif
srand(time(NULL));
display_random_messages();
}