Search code examples
creallocexit-code

realloc() causes program to stop


I am learning C though have run into a small problem using the realloc function.

The code below is meant to create two structs, each containing a list of characters, and then add the second list of characters onto the end of the first, reallocating the memory in order to do so.

This code, however, works up to the realloc call, but then finishes with exit code 0, without finishing the rest of the program.

I am unable to figure out what is happening here, and any help would be greatly appreciated.

#include <stdio.h>
#include <stdlib.h>

typedef struct String {
    char* chars;
} String;

String createString(char* chars) {
    String res;
    res.chars = chars;

    return res;
}

int main() {
    printf("Starting program!\n");

    String a = createString("Hello ");
    String b = createString("There");

    puts(a.chars);
    puts(b.chars);

    int aLength = sizeof(&a.chars) / sizeof(char);
    int bLength = sizeof(&b.chars) / sizeof(char);

    a.chars = (char*) realloc(a.chars, aLength + bLength);

    // Add b to the end of a
    for (int i = 0; i < bLength; i++) {
        a.chars[i + aLength] = b.chars[i];
    }

    puts("Complete");
    puts(a.chars);

    return 0;
}


Solution

  • C is a relatively low level language that brings us much closer to the bare silicon of the machines we work with, and as such the concepts around variable storage can be confusing at first. Especially when you are dealing with strings. Developing a deeper understanding about these elements will be important if you plan on becoming good at coding in C -- as we assume you intend to.

    As far as strings are concerned there are 3 flavours of storage that you need to be aware of; 'literals', 'automatic' variables, and dynamic allocation. These are three impressively different animals. It is neither wise -- nor even possible -- to mix them.

    1. Literals. When you declare a constant string with statements like "Hello " and "There", you are creating string 'literals'. These are unchangeable entities, hard-coded into the final executable and stored in a specific segment that is not designed to be altered at run time. Most operating systems will not even allow you to: your code will encounter a segfault when you do. Nor would you usually desire to, because if you did, you could be writing over other things that are important to you.
    2. 'Automatic' variables are allocated in cpu registers or on the stack. When C was first conceived this space was profoundly restricted. Today it is less so, although it remains limited. When you declare a character array with explicit size, you are usually creating a stack-based variable: char a[16];

      Doing so is simple and convenient, since you are not responsible for cleaning-up after yourself, but it has several down-sides. It is not possible to change the size of the array after its creation, and more significantly; since it will disappear when the function within which it is declared exits, it is not possible to return the contents from the function.

      • Later, you may investigate extensions -- like alloca -- that allow us to circumvent these restrictions to a certain extent.
    3. Dynamic allocation stores variables on the 'heap', which represents the bulk of the memory available. This is where malloc, realloc and co. come into play. Creating and using these variables is more involved, but also more powerful. As you have demonstrated in your code; these functions return pointers (addresses) to the memory requested.

      One has to be especially careful about instantiating these though. That is precisely where your code runs astray.

    You clearly understand that a string is merely a sequence of characters. When you intend to modify a string, you must either use an 'automatic' variable of sufficient size to contain the longest series you intend to accommodate, or use the heap.

    What you must not do, is take the address of a literal and then try to write into, or append to that.

    You need to modify createString to take care of that by creating the necessary storage, and copying the source string into your new buffer. Within your function you are allocating res on the stack, and then returning that variable by value -- which means that the compiler will create a copy. However, it will only copy the structure itself; it will not be allocating any pointers, or copying their contents for you. Since your structure is only one pointer wide, returning by-value may work as you intend, but it may be more stable to do things a little differently. The idiom you are using is somewhat of a mixture of C++ and C, which is likely to end problematically.

    In C, it is perhaps better to separate the declaration of the variable from its instantiation.

    Array Size. Certain compilers may help you out here as well, especially with literals, but in general it is not possible, or portable, to use sizeof(x) to get the length of a C string. If you use the standard C way of terminating your strings with zeroes, then the strlen() function will serve that purpose as you intend. When you declare a literal, the compiler automatically adds that terminator for you.

    Alternatively, you can keep track of the length -- and possibly the capacity -- yourself, and store that within the structure.

    You will also need to craft a complementary destroyString function to return the memory to the system, else your application may cause a memory 'leak' -- failing to free memory it allocated.

    Given that you are choosing this way of working, it would be consistent then to also make an appendString function to perform the corresponding task.

    In each case, it works more reliably to pass a pointer to the object to the function -- just like C++ and such do behind the scenes.

    So, within main: declare and initialize the objects separately, then use and destroy them appropriately.

    String a, b;
    
    createString( &a, "Hello " );
    createString( &b, "There" );
    
    appendString( &a, &b );
    
    puts( a.chars );
    
    destroyString( &a );
    destroyString( &b );
    

    And earlier in the file declare the implementation functions.

    void createString( String *s, char* chars ) {
       int len = strlen( chars );
       s->chars = malloc( len + 1 );   // strlen does not count the terminating null
       if( s->chars ) {  // make sure the pointer is valid: malloc may fail
          for( int i = 0; i <= len; ++i )  // make sure to copy the terminator as well
             s->chars[i] = chars[i];
       }
    }
    
    void appendString( String *a, String *b ) {
       int alen = strlen( a->chars ), 
           blen = strlen( b->chars );
       char *tmp = realloc( a->chars, alen + blen + 1 );
       if( tmp ) {
          a->chars = tmp;   // realloc will have copied the buffer for you
          for( int i = 0; i <= blen; ++i )
             a->chars[alen+i] = b->chars[i];  // start at the position of the terminator in 'a'
       }
    }
    
    void destroyString( String *s ) {
       free( s->chars );
    }
    

    Naturally, there are many significant further improvements to make to this code. We hope you will have fun discovering them...