Search code examples
clinuxmemorymemory-managementmmap

Defining variable in a memory-mapped file region in C to store in-memory tree using mmap


Background:

Suppose I have an in-memory b-tree (not b+ tree) which its nodes are declared like this:

struct node
{
int* keys;
struct node** children;
int currentNumOfKeys;
char isLeaf;
}

What I understand so far:

I've read using mmap() one could just map a memory region into a file and store the data, forgetting all about persistency and the next time you load the program, all the pointers are valid! Given that you either map the file into the exact same start address, or use relative offsets for your pointers!


Questions:

Now, if I want to take advantage of this, How can I initialize an struct object inside the mapped region?
Normally, using something like struct node* root = (struct node*)malloc( sizeof(struct node) ); would give me arbitrary memory address, wherever OS wants.

I want something like this:

1- Initialize the first node of the tree in the mapped region
2- Change my struct node** children to int children[MAX_DEGREE] > so I can store the offset for the children nodes, from this root node
3- Initialize and store the next nodes in the mapped region, in some fashion that the next time the program starts, mapping the file recreate the tree without the need of sorting it


Notes:

  1. In every step, I want to avoid using swap files to make up for memory shortage or copying data into the disk (besides the mapped file obviously) in all kinds.
  2. I would change struct node** children; into something like int children[MAX_DEGREE]; to store relative offset of every child node, from the root node.

Edited question to be more focused on one problem/question

Solution

  • The problem with mmap is that it can be seen by any program that attaches to the shared memory.

    program1    shm    program2
                XXX
    &XXX ->     &XXX   this is program1's &XXX
                       In program2, &xxx is different
    

    One way around it is to use arrays with indices instead of pointers. So instead of

    struct node
    {
       struct node* first_child;
       struct node* next_node;
       ... some data;
    };
    

    use something like

    struct node
    {
        unsigned short ix_next_node;
        unsigned short ix_first_child;
        ... some data
    };
    

    If backward navigation is required, then a previous and parent node will be required. Assuming there aren't more than 65534 nodes. Leave the value 65535 to mean there is no node. This way, children[MAX_DGREE] is not required since it is effectively a singly linked list. So if the tree looks like

    +-- 0
    |   +-- 4
    |   |   +-- 6
    |   |   +-- 7
    |   |
    |   +-- 5
    |
    +-- 1
    |   +-- 2
    |
    +-- 3
    

    The parent, prev node, next node and child would be

    Index parent prev  next  first
    0     65535  65535 1     4
    1     65535  0     3     65545
    2     1      65545 65535 65535
    3     65535  1     65535 65535
    4     0      65535 5     65535
    5     0      4     65535 65535
    6     4      65535 7     65535
    7     4      6     65535 65535
    

    All you need to do now is to write a program to do it.