Search code examples
linuxfilec++11memory-mapped-files

C++ - Increasing the size of a file in the system with memory-mapping


Problem and expected result

I'm trying to read in a file, and see if I can write that data into another non-empty file. In the end, I'd like to be able to be able to insert data into the file, but that's a different problem. Right now, I'm able to write to the file, but the size of the file doesn't change. So for example, given this file

test.dat

Name,marker1,marker2,marker3,marker4
barc1,AA,AB,BB,--
barc2,AB,AA,BB,--

If I try to write the following data

test.dat.toAdd

barc3,BB,AB,--,AA
barc4,AB,--,BB,AA
barc5,--,AB,AA,BB
barc6,BB,AA,AB,--
barc7,AA,AB,BB,AA
barc8,BB,AB,AA,BB

Starting at 15 bytes, I'm expecting to get

Name,marker1,mabarc3,BB,AB,--,AA
barc4,AB,--,BB,AA
barc5,--,AB,AA,BB
barc6,BB,AA,AB,--
barc7,AA,AB,BB,AA
barc8,BB,AB,AA,BB

But I actually just end up getting

test.dat

Name,marker1,mabarc3,BB,AB,--,AA
barc4,AB,--,BB,AA
barc5,--,AB,AA,BB
barc

So it only writes to whatever the size of test.dat originally was

Code

Here is the code I'm using my_write.cpp

#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <iostream>
#include <cstring>
#include <sstream>
#include <unistd.h>

// Opens up [filename] after [offset] bytes and returns a pointer in memory to the beginning of [filename] + [offset]
char* open_mmap_file_read(const char* filename, long long offset, long long& filesize) {
  // Open the file in read-only mode
  const char* my_file = filename;
  int fd = open(my_file, O_RDONLY);
  if (fd < 0) { std::cerr << "Cannot open the file " << my_file << std::endl; }

  // Get the filesize (and possibly error) of the file opening. Filesize is in bytes
  struct stat statbuf;
  int err = fstat(fd, &statbuf);
  off_t sz = statbuf.st_size;
  if (err < 0) { std::cerr << "Cannot open the file " << my_file << " because of fstat val " << err << std::endl; return NULL; }
  std::cout << "File size for " << my_file << " is " << sz << " bytes" << std::endl;

  // Map the file into memory with a pointer to the beginning of the file
  void *fileArea = mmap(NULL, sz, PROT_READ, MAP_SHARED, fd, 0);
  if (! fileArea) { std::cerr << "Cannot map " << my_file << std::endl; return NULL; }
  std::cout << "File " << my_file << " mapped to address " << fileArea << std::endl;

  // Move the pointer according to the [offset] parameter in memory
  char *localArea = reinterpret_cast<char*>(fileArea);
  if (! localArea) { std::cerr << "Cannot allocate " << sz << " bytes" << std::endl; return NULL; }
  localArea += offset;

  // Store the filesize for use when function returns
  filesize = sz;

  return localArea;
}

// Opens [filename] and writes [data_to_write] in [filename] starting at [offset] bytes. The amount of data to be written is [size_data_to_write]
char* open_mmap_file_write(const char* filename, long long offset, char* data_to_write, long long size_data_to_write, long long& filesize) {
  // Open the file in read/write mode
  const char* my_file = filename;
  int fd = open(my_file, O_RDWR);
  if (fd < 0) { std::cerr << "Cannot open the file " << my_file << std::endl; }

  // Get the filesize (and possibly error) of the file opening. Filesize is in bytes
  struct stat statbuf;
  int err = fstat(fd, &statbuf);
  off_t sz = statbuf.st_size;
  if (err < 0) { std::cerr << "Cannot open the file " << my_file << " because of fstat val " << err << std::endl; return NULL; }
  std::cout << "File size for " << my_file << " is " << sz << " bytes" << std::endl;
  filesize = sz;

  // Map the file into memory with a pointer to the beginning of the file
  void *fileArea = mmap(NULL, offset + size_data_to_write, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
  //void *fileArea = mmap(NULL, 200, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
  if (! fileArea) { std::cerr << "Cannot map " << my_file << std::endl; return NULL; }
  std::cout << "File " << my_file << " mapped to address " << fileArea << std::endl;

  // Move the pointer according to the [offset] parameter in memory
  char *localArea = reinterpret_cast<char*>(fileArea);
  if (! localArea) { std::cerr << "Cannot allocate " << sz << " bytes" << std::endl; return NULL; }
  localArea += offset;

  // Copy [data_to_write] into the memory mapping of [filename] starting from [offset] bytes
  //ssize_t n = write(fd, data_to_write, size_data_to_write);
  void* copiedArea = memcpy(localArea, data_to_write, size_data_to_write);

  // Return a pointer to the beginning of the file
  return reinterpret_cast<char*>(fileArea);
}

int main(int argc, char *argv[]) {
  if (argc < 4) {
    std::cerr << "Please provide two command-line arguments: file_to_open (string) and file_offset (integer), file_to_add (string)" << std::endl;
    return 1;
  }

  // Get the filename and offset parameters and make sure offset is valid
  const char* file_to_open = argv[1];
  const char* arg_file_offset = argv[2];
  const char* file_to_add = argv[3];
  std::istringstream iss(arg_file_offset);
  long long file_offset;
  if (!(iss >> file_offset)) { std::cerr << "Cannot convert command-line argument " << arg_file_offset << " into an integer for file offset" << std::endl; return 1; }

  long long file_to_add_size; // Pass this by reference to the following function to keep track of filesize
  char* file_to_add_pointer = open_mmap_file_read(file_to_add, 0, file_to_add_size);
  // Print out the first 20 characters from where the file start pointing
  for (int i = 0; i < 20; i++) { std::cout << file_to_add_pointer[i]; } std::cout << std::endl;
  std::cout << std::endl;

  long long file_size;
  char* file_pointer = open_mmap_file_write(file_to_open, file_offset, file_to_add_pointer, file_to_add_size, file_size);
  // Print out all the characters that were written to the memory-mapping
  for (int i = 0; i < file_offset + file_to_add_size; i++) { std::cout << file_pointer[i]; } std::cout << std::endl;
  std::cout << std::endl;

  return 0;
}

It's run with ./my_write test.dat 15 test.dat.toAdd

So my question is, how do I "expand" the file to accomodate the full data that's being written. It is being written to the memory (as we can see from the print in the main function), and it's even being written to the file, but it gets truncated according to the filesize. I'm sure it's a simple fix, but I can't seem to find out how to tell the system to expand the file's memory


Solution

  • The destination file is simply not large enough to store all of test.dat.toAdd + 15 bytes.

    You can extend it with ftruncate in open_mmap_file_write:

        // ...
        std::cout << "File size for " << my_file << " is " << sz << " bytes"
                  << std::endl;
    
        // Add this --- start ---
        auto new_minimum_size = offset + size_data_to_write;
        if(new_minimum_size > sz) {
            if(ftruncate(fd, new_minimum_size) == -1) {
                std::perror("ftruncate");
                return nullptr;
            }
            sz = new_minimum_size;
        }
        // Add this --- end ---
    
        filesize = sz;
        //...