Search code examples
cmmapmemcpy

Misunderstanding of memcpy & mmap


I need to use a shared memory between processes and I found a sample code here. First of all, I need to learn how to create a shared memory block and store a string in it. To do that I used following code:

#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <string.h>
#include <unistd.h>

void* create_shared_memory(size_t size) {
  // Our memory buffer will be readable and writable:
  int protection = PROT_READ | PROT_WRITE;

  // The buffer will be shared (meaning other processes can access it), but
  // anonymous (meaning third-party processes cannot obtain an address for it),
  // so only this process and its children will be able to use it:
  int visibility = MAP_ANONYMOUS | MAP_SHARED;

  // The remaining parameters to `mmap()` are not important for this use case,
  // but the manpage for `mmap` explains their purpose.
  return mmap(NULL, size, protection, visibility, 0, 0);
}



int main() {
  char msg[] = "hello world!";

  void* shmem = create_shared_memory(1);
  printf("sizeof shmem: %lu\n", sizeof(shmem));
  printf("sizeof msg: %lu\n", sizeof(msg));
  memcpy(shmem, msg, sizeof(msg));
  printf("message: %s\n", shmem);

}

Output:

sizeof shmem: 8
sizeof msg: 13
message: hello world!

In main function, I'm creating 1 byte shared memory block (shmem) and trying to store 13 byte information (char msg[]) in it. When I print out the shmem, it prints whole message. I'm expecting that, it prints out just 1 byte message in this case is just "h". Or it could give an error about memory size when compiled.

The question is that I'm missing sth here? Or is there a implementation issue? Does memcpy overlap here? I'm appreciated for any brief explanation.

Thanks in advance.


Solution

    1. In printf("message: %s\n", shmem);, the %s specifier says to print the “string” starting at shmem. For this purpose, a string is a sequence of characters ending with the null character. So the printf prints all the bytes it finds at shmem up to the null character. To limit it to at most one character, you can use %.1s instead, or you can explicitly print a character with printf("message: %c\n", * (char *) shmem);.

    2. When you allocate memory with mmap, the system works with memory in units of pages. The size of a page varies from system to system, but it is typically something like 512 or 4096 bytes, not 1. The standard specification for mmap only guarantees that the number of bytes you request is provided. There may be additional bytes accessible beyond this, but you should not rely on them being available. (Even if they appear to be available momentarily, the system might not save them to disk when your program is temporarily swapped out of memory, so they will not be restored when your program is brought back into memory to continue running.)

    3. sizeof(shmem) provides the size of shmem, which is a pointer. So it provides the size of the pointer, which is usually four or eight bytes on modern systems. It does not provide the size of the thing that shmem points to.

    4. In contrast, in sizeof(msg), msg is an array, not a pointer, so sizeof(msg) does provide the size of the array, as you likely intend.

    5. memcpy(shmem, msg, sizeof(msg)); copies 13 bytes (the size of your msg) into shmem. Those thirteen bytes are “hello world!” and a null character (value 0) at the end. memcpy does not have any way of knowing how long the source or destination is except for the length parameter that you pass. So it copies sizeof(msg) bytes. It does not limit itself to the size of the memory pointed to by shmem. It is your job to pass the correct length.

    To answer your question about what happens if you use more bytes than mmap provides, the behavior is undefined. If you go beyond a page boundary, it is most likely that your program will crash because memory beyond that address is not mapped. But you might write bytes to a place in your memory you did not want to, and that can cause any variety of things to happen, because it can damage code or data that your program needs to execute properly.

    In this case, you did not write beyond mapped memory. You asked for 13 bytes and were likely given 4096 (or whatever one page on your system is). Then you copied those 13 bytes into the buffer and printed them. So everything “worked.”