Search code examples
c

What is returned in this function?


If I interprete this correctly, it takes as inputs two (long) integers, creates an array, and subtracts the array and an integer, but I thought that I could not subtract array and integers.

What does this function actually return?

int *ivector (long nl, long nh)
/* allocate an int vector with subscript range v[nl..nh] */
{
  int *retval;

  retval = malloc(sizeof(int)*(nh-nl+1));

  return retval - nl;
}

Solution

  • Before exploring the behavior of this ivector() function, let's review some basic facts about arrays and pointers in C.

    Consider the code

    int a[10];
    for(i = 0; i < 10; i++)
        a[i] = 100 + i;
    

    This results in an array in memory which we can think of like this:

        +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
     a: | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 |
        +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
           0     1     2     3     4     5     6     7     8     9
    

    Suppose we now say

    int *ip = a;
    

    Due to the correspondence between arrays and pointers in C, this is equivalent to saying

    int *ip = &a[0];
    

    In any case, we end up with a pointer pointing at the first cell of a, like this:

        +-----+
    ip: |  *  |
        +--|--+
           |
           v
        +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
     a: | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 |
        +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
    

    Now, pointer arithmetic: When you add an integer to a pointer, you "move" the pointer so that it points to the next element in an underlying array. Make sure you understand all the different ways in which this code prints the number 102:

    int *ip2 = ip + 2;
    printf("%d %d %d %d\n", *(ip + 2), ip[2], *ip2, ip2[0]);
    

    (If you don't understand how all four expressions *(ip+2), ip[2], *ip2, and ip2[0] evaluate to the number 102, please read about this or ask. It's another facet of the "correspondence between arrays and pointers", and it's fundamental to our understanding of the ivector function.)

    Pointer subtraction works, too: the call

    printf("%d %d\n", *(ip2 - 1), ip2[-1]);
    

    prints 101, two slightly different ways.

    Now, let's look at the ivector() function. It's trying to help us simulate arrays that don't necessarily start at 0. If we call

    int a2 = ivector(0, 9);
    for(i = 0; i <= 9; i++) a2[i] = 100 + i;
    

    we'll end up with an array almost exactly like we had before:

        +-----+
    a2: |  *  |
        +--|--+
           |
           v
        +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
        | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 |
        +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
    

    The only difference is that the array itself has no name: it's an anonymous region of memory we got by calling malloc.

    Now suppose we call

    int a3 = ivector(-5, 5);
    for(i = -5; i <= 5; i++) a3[i] = 100 + i;
    

    Now we end up with an 11-element "array" which we can think of as looking like this:

        +-----+
    a3: |  *-----------------------------+
        +-----+                          |
                                         v
        +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
        | 95  | 96  | 97  | 98  | 99  | 100 | 101 | 102 | 103 | 104 | 105 |
        +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
          -5    -4    -3    -2    -1     0     1     2     3     4     5
    

    Note that we can talk about a3[0], a3[3], a3[-2], etc., just as if this were a regular array with a lower bound of -5. The key to this is that subtraction at the end of ivector you were asking about:

    return retval - nl;
    

    This doesn't subtract anything from the values of an array, or anything; it's pointer arithmetic again, subtracting nl from the pointer value retval. For the call ivector(-5, 5), this translates to

    return retval - -5;
    

    which of course is equivalent to

    return retval + 5;
    

    so we got a pointer 5 elements in to the allocated region.

    Now suppose we call

    int *a4 = ivector(1, 10);
    for(i = 1; i <= 10; i++) a4[i] = 100 + i;
    

    This is where it all breaks down. The intent is that we end up with a picture like this:

        +-----+
    a4: |  *  |
        +--|--+
           |
           v
              +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
              | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 |
              +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
                 1     2     3     4     5     6     7     8     9    10
    

    But there's a pretty obvious problem: a4 doesn't actually point into the allocated array.

    Based on the way pointer arithmetic works, and the way it's traditionally been implemented by straightforward compilers for straightforward computer architectures, you can convince yourself that this code "ought to" work anyway, and that you'd be able to access a4[1], a4[2], ... up to a4[10]. There'd be horrible problems if you tried to access a4[0], of course, but that's okay, you're not supposed to do that, because a4 is a 1-based array.

    Unfortunately, this last fragment of code is not guaranteed to work. Pointer arithmetic is not defined if you compute a pointer that points "outside" of an array (either an actual array you declared, or an array-like block of memory you got by calling malloc). If you try to compute such a pointer, the behavior is undefined, even if you don't ever try to access the memory that the out-of-bounds pointer points at. So most knowledgeable C programmers will advise you not to write code like ivector (or if you do, to call it only for nl <= 0... but of course that pretty much defeats the purpose).