Search code examples
c++boostboost-iostreams

Why mapped_file::data returns char* instead of void*


Or even better a template <T*>?

In case the memory mapped file contains a sequence of 32 bit integers, if data() returned a void*, we could be able to static cast to std::uint32_t directly.

Why did boost authors choose to return a char* instead?

EDIT: as pointed out, in case portability is an issue, a translation is needed. But saying that a file (or a chunk of memory in this case) is a stream of bytes more than it is a stream of bits, or of IEEE754 doubles, or of complex data structures, seems to me a very broad statement that needs some more explanation.

Even having to handle endianness, being able to directly map to a vector of be_uint32_t as suggested (and as implemented here) would make the code much more readable:

struct be_uint32_t {
  std::uint32_t raw;
  operator std::uint32_t() { return ntohl(raw); }
};

static_assert(sizeof(be_uint32_t)==4, "POD failed");

Is it allowed/advised to cast to a be_uint32_t*? Why, or why not?

Which kind of cast should be used?

EDIT2: Since it seems difficult to get to the point instead of discussing weather the memory model of an elaborator is made of bits, bytes or words I will rephrase giving an example:

#include <cstdint>
#include <memory>
#include <vector>
#include <iostream>
#include <boost/iostreams/device/mapped_file.hpp>

struct entry {
  std::uint32_t a;
  std::uint64_t b;
} __attribute__((packed)); /* compiler specific, but supported 
                              in other ways by all major compilers */

static_assert(sizeof(entry) == 12, "entry: Struct size mismatch");
static_assert(offsetof(entry, a) == 0, "entry: Invalid offset for a");
static_assert(offsetof(entry, b) == 4, "entry: Invalid offset for b");

int main(void) {
  boost::iostreams::mapped_file_source mmap("map");
  assert(mmap.is_open());
  const entry* data_begin = reinterpret_cast<const entry*>(mmap.data());
  const entry* data_end = data_begin + mmap.size()/sizeof(entry);
  for(const entry* ii=data_begin; ii!=data_end; ++ii)
    std::cout << std::hex << ii->a << " " << ii->b << std::endl;
  return 0;
}

Given that the map file contains the bit expected in the correct order, is there any other reason to avoid using the reinterpret_cast to use my virtual memory without copying it first?

If there is not, why force the user to do a reinterpret_cast by returning a typed pointer?

Please answer all the questions for bonus points :)


Solution

  • Returning a char * seems to be just a (peculiar) design decision of boost::iostreams implementation.

    Other APIs like e.g. the boost interprocess return void*.

    As observed by sehe the UNIX mmap specification (and malloc) use void* as well.

    It is somewhat a duplicate of void* or char* for generic buffer representation?

    As a note of caution the layer of translation mentioned by Lightness in another answer may be needed when the memory is written from one architecture and read on a different one. Endianness is easy to solve using a conversion type, but alignment need to be considered as well.

    About static cast: http://en.cppreference.com/w/cpp/language/static_cast mentions:

    A prvalue of type pointer to void (possibly cv-qualified) can be converted to pointer to any type. If the value of the original pointer satisfies the alignment requirement of the target type, then the resulting pointer value is unchanged, otherwise it is unspecified. Conversion of any pointer to pointer to void and back to pointer to the original (or more cv-qualified) type preserves its original value.

    So if the file to be memory mapped was created on a different architecture with a different alignment, the loading may fail (e.g. with a SIGBUS) depending on the architecture and the OS.