Search code examples
cperformancemalloc

Should one avoid a memory allocation practice that forces a single malloc call?


I wonder if allocating memory for all the data related to a struct is a good practice. So, let's say that we have the following struct, which aims to store arrays of float-arrays:

#include <stdlib.h>

typedef struct Arrays {
    int n;
    int* l;
    double** data;
} Arrays;

One can initialize an example like this:

Arrays* UsualInitialization(int n) {
    // Constructs an array A of length n containing the arrays A[i] = [1,2,...,i]
    Arrays* A = (Arrays*) malloc(sizeof(Arrays*));
    A->n = n;
    A->l = (int*) malloc(n * sizeof(int));
    A->data = (double**) malloc(n * sizeof(double*));
    for (int i = 0; i < n; i++) {
        A->l[i] = i + 1;
        A->data[i] = (double*) malloc((i + 1) * sizeof(double));
        for (int j = 0; j < i + 1; j++)
            A->data[i][j] = j + 1;
    }

    return A;
}

Every pointer has a corresponding malloc call. But we can also do something like this:

Arrays* BlockInitialization(int n) {
    // Constructs an array A of length n containing the arrays A[i] = [1,2,...,i]
    Arrays* A = (Arrays*) malloc(sizeof(Arrays) + 
                    n * sizeof(int) + 
                    n * sizeof(double*) +
                    (n * (n+1) / 2) * sizeof(double)
    );
    
    A->n = n;
    A->l = (int*) (A + 1);
    A->data = (double**) (A->l + n);
    A->data[0] = (double*) (A->data + n);

    for (int i = 0; i < n; i++) {
        A->l[i] = i + 1;
        for (int j = 0; j < i + 1; j++) 
            A->data[i][j] = j + 1;
        if (i < n - 1)
            A->data[i+1] = A->data[i] + i + 1;
    }

    return A;
}

Here, there's a single malloc call, and all the other pointers are obtained from the first one by pointer arithmetic and casting. I figure this might be more efficient in certain cases since all the allocated memory is in a single block.

So, I'd like to ask: does this practice incur in undefined behavior? Does it affect performance (in some cases)? Does it make the code more obscure?

I leave below a print method and a main that show that both initializations give the same result (at least in my machine).

void PrintArrays(Arrays* A) {
    printf("Variable contains %d arrays: \n", A->n);
    for (int i = 0; i < A->n; i++) {
        printf("Array %d (of length %d): [", i, A->l[i]);
        for (int j = 0; j < A->l[i]; j++) {
            printf("%f", A->data[i][j]);
            if (j < A->l[i] - 1)  printf(", ");
        }
        printf("] \n");
    }
}

int main() {
    Arrays* A = BlockInitialization(10);
    PrintArrays(A);

    Arrays* B = UsualInitialization(10);
    PrintArrays(B);

    return 0;
}

Solution

  • So, I'd like to ask: does this practice incur in undefined behavior?

    Not if you do it correctly.

    The only obvious issue here is that for odd n, the final pointers and doubles could be misaligned. You need to use alignof to figure out how much padding is needed between consecutive members.

    Does it affect performance (in some cases)?

    Yes, it can make the performance much better on platforms which benefit from cache-friendly memory layouts.

    It'll probably be even better if you lose all the pointers, remove the 1d array and flatten the 2d array, and provide simple accessors to get/index the array(s).

    Even redundant-looking offset calculations are typically faster than multiple chained pointer dereferences. (Also, removing the pointers makes the data even more compact and cache-friendly).

    The minimal version is just

    struct LowerTriangular
    {
      int order;
      double data[];
    };
    

    and as a bonus, this also saves you from managing the alignment.

    Does it make the code more obscure?

    Yes, but if the obscure bit is encapsulated in two functions (allocator and deallocator), you can comment it well, test it thoroughly, and then forget about it.