Search code examples
carrayspointersmultidimensional-arrayvariable-length-array

Return a malloc’ed matrix while being able to use subscript notation


I have an exercise where I am supposed to use fixed-size arrays and in/out parameters to do stuff on matrices (add, scanf, print, etc.), but I’d like to do it on arbitrary-length matrices and return them rather than adding each time more (in/)out parameters (thus possibly allowing a more “functional” style).

Since I want to return them, I suppose I probably need malloc to keep the array in memory passed the function scope. Since I want to use multidimensional subscript notation (mat[x][y] rather than mat[x*len+y] or mat+x*len+y) I guess I should use some kind of vla or casting… yet it seems cast to array is forbidden (but I’m going to often return pointers, and how to use subscript notation on them if I can’t cast?), and I visibly “may not initialize a variable-sized object” as says the compiler (even if it’s not directly an array but a pointer to an array), like using this notation:

int *tab[x][y]=malloc(x*y*sizeof(int));

I also get “invalid initializer” if I replace x and y with constant values like 3 by hand.

I passed almost a week searching and maybe that’s impossible and I should just move forward… I also found this notation, which to me looks like function-pointer notation, unless it is a way to prioritize the * operator…

int (*tab)[x][y]=malloc(x*y*sizeof(int));

However I’m not totally sure to understand this notation as then get random values from printed/filled arrays with this way.

Previously I’ve tried to use VLAs (variable length arrays) and GNU extension for giving array lengths as parameter:

void
printMat (int h, int w; int tab[h][w], int h, int w)
{
   [code using tab[x][y]]
}

but I soon realized I needed to treat with pointers and malloc anyway for a “add” function adding two matrices and returning a pointer to a new malloc’ed matrix anyway…


I’d especially like to know, in case I wasn’t specific enough, how should I declare arguments and return type in order to be able to use them as multidimensional arrays without having to use an intermediary variable, while actually passing a pointer (anyway that’s already what’s passing a normal multidimensional array as parameter do right?)


Okay after many tests and tries, it now works as I intended, even if I’m not sure to have understood everything exactely well, especially on what’s a pointer and what’s not (I maybe confused myself by trying to figure out with gdb this, I should probably investigate further on if a normal uni- or multidimensional array is considered as an address or not by gdb, etc.), and as today I’ve not got my sleep/rest and concentration at its best.

Now, I’d like a proper answer to the second part of my initial question: how to return? is there a proper generic type (other than meaningless void*) which may be apropriated for a pointer to a 2-dimensional array (like int(*)[][] but that would work?)? if too generic, what’s the proper way to cast the returned pointer so I can use multidimensional subscript notation on it? is (int(*)[3][3]) correct?

However, if I get nothing satisfactory for this (a justified-enough “it’s impossible in C” is fine I guess), I’ll set @JohnBod current answer as solving the problem, as he gave confirmation for multidimensional vla malloc via a complete and explicative answer on multidimensional arrays, answering fully the first part of question, and gave several answers on the path to the second (if there is any).

#include <stdio.h>
#include <stdlib.h>

void
print_mat (int x, int y; int mat[x][y], int x, int y)
{
  for (int i = 0; i < x; i++)
    {
      for (int j=0; j < y ; j++)
        printf("%d ", mat[i][j]);
      putchar('\n');
    }
  putchar('\n');
}

void*
scan_mat (int x, int y)
{
  int (*mat)[x][y]=malloc(sizeof(*mat));
  for (int i = 0; i < x ; i++)
    for (int j = 0; j < y; j++)
      {
        printf("[%d][%d] = ", i, j);
        scanf("%d", &((*mat)[i][j]));
      }
  return mat;
}

void*
add_mat (int x, int y; int mat1[x][y], int mat2[x][y], int x, int y)
{
  int (*mat)[x][y]=malloc(*mat);
  #pragma GCC ivdep
  for (int i = 0; i < x ; i++)
    for (int j = 0; j < y; j++)
      (*mat)[i][j]=mat1[i][j]+mat2[i][j];
  return mat;
}

int
main ()
{
  int mat1[3][3] = {1, 2, 3,
                    4, 5, 6,
                    7, 8, 9},
    (*mat2)[3][3] = scan_mat(3, 3);
  print_mat(mat1, 3, 3);
  print_mat(*mat2, 3, 3);
  print_mat((int(*)[3][3])add_mat(mat1, *mat2, 3, 3), 3, 3); // both appears to work… array decay?
  print_mat(*(int(*)[3][3])add_mat(mat1, *mat2, 3, 3), 3, 3);
  printf("%d\n", (*(int(*)[3][3])add_mat(mat1, *mat2, 3, 3))[2][2]);
  return 0;
}

and the input/output:

[0][0] = 1
[0][1] = 1
[0][2] = 1
[1][0] = 1
[1][1] = 1
[1][2] = 1
[2][0] = 1
[2][1] = 1
[2][2] = 1
1 2 3 
4 5 6 
7 8 9 

1 1 1 
1 1 1 
1 1 1 

2 3 4 
5 6 7 
8 9 10 

2 3 4 
5 6 7 
8 9 10 

10

Solution

  • If you want to allocate a buffer of type T, the typical procedure is

    T *ptr = malloc( sizeof *ptr * N ); // sizeof *ptr == sizeof (T)
    

    You're allocating enough space for N elements of type T.

    Now let's replace T with an array type, R [M]:

    R (*ptr)[M] = malloc( sizeof *ptr * N  ); // sizeof *ptr == sizeof (R [M])
    

    You're allocating enough space for N elements of type R [M] - IOW, you've just allocated enough space for an N by M array of R. Note that the semantics are exactly the same as for the array of T above; all that's changed is the type of ptr.

    Applying that to your example:

    int (*tab)[y] = malloc( sizeof *tab * x );
    

    You can then index tab as you would any 2D array:

    tab[x][y] = new_value();
    

    Edit

    Answering the comment:

    yet, still, I’m not sure to understand: what’s the meaning of the “(*tab)” syntax? it’s not a function pointer I guess, but why wouldn’t *tab without parenthesis work: what’s the actual different meaning? why doesn’t it work and what does change then?

    The subscript [] and function call () operators have higher precedence than unary *, so a declaration like

    int *a[N];
    

    is parsed as

    int *(a[N]);
    

    and declares a as an array of pointers to int. To declare a pointer to an array, you must explicitly group the * operator with the identifier, like so:

    int (*a)[N];
    

    This declares a as a pointer to an array of int. The same rule applies to function declarations. Here's a handy summary:

    T *a[N];    // a is an N-element array of pointers to T
    T (*a)[N];  // a is a pointer to an N-element array of T
    T *f();     // f is a function returning pointer to T
    T (*f)();   // f is a pointer to a function returning T
    

    In your code,

    int *tab[x][y]=malloc(x*y*sizeof(int));
    

    declares tab as a 2D array of pointers, not as a pointer to a 2D array, and a call to malloc(...) is not a valid initializer for a 2D array object.

    The syntax

    int (*tab)[x][y]=malloc(x*y*sizeof(int));
    

    declares tab as a pointer to a 2D array, and a call to malloc is a valid initializer for it.

    But...

    With this declaration, you'll have to explicitly dereference tab before indexing into it, like so:

    (*tab)[i][j] = some_value();
    

    You're not indexing into tab, you're indexing into what tab points to.

    Remember that in C, declaration mimics use - the structure of a declarator in a declaration matches how it will look in the executable code. If you have a pointer to an int and you want to access the pointed-to value, you use the unary * operator:

    x = *ptr;
    

    The type of the expression *ptr is int, so the declaration of ptr is written

    int *ptr;
    

    Same thing for arrays, if the ith element of an array has type int, then the expression arr[i] has type int, and thus the declaration of arr is written as

    int arr[N];
    

    Thus, if you declare tab as

    int (*tab)[x][y] = ...;
    

    then to index into it, you must write

    (*tab)[i][j] = ...;
    

    The method I showed avoids this. Remember that the array subscript operation a[i] is defined as *(a + i) - given an address a, offset i elements (not bytes!) from a and dereference the result. Thus, the following relationship holds:

    *a == *(a + 0) == a[0]
    

    This is why you can use the [] operator on a pointer expression as well as an array expression. If you allocate a buffer as

    T *p = malloc( sizeof *p * N );
    

    you can access each element as p[i].

    So, given a declaration like

    T (*a)[M];
    

    we have the relationship

     (*a)[i] == (*(a + 0))[i] == (a[0])[i] == a[0][i];
    

    Thus, if we allocate the array as

    T (*a)[M] = malloc( sizeof *a * N );
    

    then we can index each element of a as

    a[i][j] = some_value();