Search code examples
c++boostiteratorstdrandom-access

C++ random access iterators for containers with elements loaded on demand


I'm currently working on a small project which requires loading messages from a file. The messages are stored sequentially in the file and files can become huge, so loading the entire file content into memory is unrewarding.

Therefore we decided to implement a FileReader class that is capable of moving to specific elements in the file quickly and load them on request. Commonly used something along the following lines

SpecificMessage m;
FileReader fr;
fr.open("file.bin");
fr.moveTo(120); // Move to Message #120
fr.read(&m);    // Try deserializing as SpecificMessage 

The FileReader per se works great. Therefore we thought about adding STL compliant iterator support as well: A random access iterator that provides read-only references to specific messages. Used in the following way

for (auto iter = fr.begin<SpecificMessage>(); iter != fr.end<SpecificMessage>(); ++iter) {
  // ...
}

Remark: the above assumes that the file only contains messages of type SpecificMessage. We've been using boost::iterator_facade to simplify the implementation.

Now my question boils down to: how to implement the iterator correctly? Since FileReader does not actually hold a sequence of messages internally, but loads them on request.

What we've tried so far:

Storing the message as an iterator member

This approach stores the message in the iterator instance. Which works great for simple use-cases but fails for more complex uses. E.g. std::reverse_iterator has a dereference operation that looks like this

 reference operator*() const
 {  // return designated value
   _RanIt _Tmp = current;
   return (*--_Tmp);
 }

This breaks our approach as a reference to a message from a temporary iterator is returned.

Making the reference type equal the value type

@DDrmmr in the comments suggested making the reference type equal the value type, so that a copy of the internally stored object is returned. However, I think this is not valid for the reverse iterator which implements the -> operator as

pointer operator->() const {
  return (&**this);
}

which derefs itself, calls the *operator which then returns a copy of a temporary and finally returns the address of this temporary.

Storing the message externally

Alternatively I though about storing the message externally:

SpecificMessage m;
auto iter = fr.begin<SpecificMessage>(&m);
// ...

which also seems to be flawed for

auto iter2 = iter + 2

which will have both iter2 and iter point to the same content.


Solution

  • As I hinted in my other answer, you could consider using memory mapped files. In the comment you asked:

    As far as memory mapped files is concerned, this seems not what I want to have, as how would you provide an iterator over SpecificMessages for them?

    Well, if your SpecificMessage is a POD type, you could just iterate over the raw memory directly. If not, you could have a deserialization helper (as you already have) and use Boost transform_iterator to do the deserialization on demand.

    Note that we can make the memory mapped file managed, effectively meaning that you can just use it as a regular heap, and you can store all standard containers. This includes node-based containers (map<>, e.g.), dynamic-size containers (e.g. vector<>) in addition to the fixed-size containers (array<>) - and any combinations of those.

    Here's a demo that takes a simple SpecificMessage that contains a string, and (de)derializes it directly into shared memory:

    using blob_t       = shm::vector<uint8_t>;
    using shared_blobs = shm::vector<blob_t>;
    

    The part that interests you would be the consuming part:

    bip::managed_mapped_file mmf(bip::open_only, DBASE_FNAME);
    shared_blobs* table = mmf.find_or_construct<shared_blobs>("blob_table")(mmf.get_segment_manager());
    
    using It = boost::transform_iterator<LazyLoader<SpecificMessage>, shared_blobs::const_reverse_iterator>;
    
    // for fun, let's reverse the blobs
    for (It first(table->rbegin()), last(table->rend()); first < last; first+=13)
        std::cout << "blob: '" << first->contents << "'\n";
    
    // any kind of random access is okay, though:
    auto random = rand() % table->size();
    SpecificMessage msg;
    load(table->at(random), msg);
    std::cout << "Random blob #" << random << ": '" << msg.contents << "'\n";
    

    So this prints each 13th message, in reverse order, followed by a random blob.

    Full Demo

    The sample online uses the lines of the sources as "messages".

    Live On Coliru

    #include <boost/interprocess/file_mapping.hpp>
    #include <boost/interprocess/managed_mapped_file.hpp>
    #include <boost/container/scoped_allocator.hpp>
    #include <boost/interprocess/containers/vector.hpp>
    #include <iostream>
    
    #include <boost/iterator/transform_iterator.hpp>
    #include <boost/range/iterator_range.hpp>
    
    static char const* DBASE_FNAME = "database.map";
    
    namespace bip = boost::interprocess;
    
    namespace shm {
        using segment_manager = bip::managed_mapped_file::segment_manager;
        template <typename T> using allocator = boost::container::scoped_allocator_adaptor<bip::allocator<T, segment_manager> >;
        template <typename T> using vector    = bip::vector<T, allocator<T> >;
    }
    
    using blob_t       = shm::vector<uint8_t>;
    using shared_blobs = shm::vector<blob_t>;
    
    struct SpecificMessage {
        // for demonstration purposes, just a string; could be anything serialized
        std::string contents;
    
        // trivial save/load serialization code:
        template <typename Blob>
        friend bool save(Blob& blob, SpecificMessage const& msg) {
            blob.assign(msg.contents.begin(), msg.contents.end());
            return true;
        }
    
        template <typename Blob>
        friend bool load(Blob const& blob, SpecificMessage& msg) {
            msg.contents.assign(blob.begin(), blob.end());
            return true;
        }
    };
    
    template <typename Message> struct LazyLoader {
        using type = Message;
    
        Message operator()(blob_t const& blob) const {
            Message result;
            if (!load(blob, result)) throw std::bad_cast(); // TODO custom excepion
            return result;
        }
    };
    
    ///////
    // for demo, create some database contents
    void create_database_file() {
        bip::file_mapping::remove(DBASE_FNAME);
        bip::managed_mapped_file mmf(bip::open_or_create, DBASE_FNAME, 1ul<<20); // Even sparse file size is limited on Coliru
    
        shared_blobs* table = mmf.find_or_construct<shared_blobs>("blob_table")(mmf.get_segment_manager());
    
        std::ifstream ifs("main.cpp");
        std::string line;
        while (std::getline(ifs, line)) {
            table->emplace_back();
            save(table->back(), SpecificMessage { line });
        }
    
        std::cout << "Created blob table consisting of " << table->size() << " blobs\n";
    }
    
    ///////
    
    void display_random_messages() {
        bip::managed_mapped_file mmf(bip::open_only, DBASE_FNAME);
        shared_blobs* table = mmf.find_or_construct<shared_blobs>("blob_table")(mmf.get_segment_manager());
    
        using It = boost::transform_iterator<LazyLoader<SpecificMessage>, shared_blobs::const_reverse_iterator>;
    
        // for fun, let's reverse the blobs
        for (It first(table->rbegin()), last(table->rend()); first < last; first+=13)
            std::cout << "blob: '" << first->contents << "'\n";
    
        // any kind of random access is okay, though:
        auto random = rand() % table->size();
        SpecificMessage msg;
        load(table->at(random), msg);
        std::cout << "Random blob #" << random << ": '" << msg.contents << "'\n";
    }
    
    int main()
    {
    #ifndef CONSUMER_ONLY
        create_database_file();
    #endif
    
        srand(time(NULL));
        display_random_messages();
    }