Search code examples
cstringmemorycompiler-constructionstring-literals

Memory usage of literal strings in C


How does the compiler manages memory when you pass a string lteral to a function in parameter instead of a pointer to an array of chars?

Example:

static const char myString[LENGTH] = "A string";
myFunction(myString);

and:

myFunction("A string");

Does having a static const (which will most likely be stored in ROM) passed via a pointer yields significant benefits regarding RAM usage?

When passing the string literal is it copied entirely as a local variable of sizeof(myString) or the compiler "knows" to pass it by reference since arrays are always passed by reference in C?


Solution

  • When passing the string literal is it copied entirely as a local variable of sizeof(myString) or the compiler "knows" to pass it by reference since arrays are always passed by reference in C?

    A string literal is stored as an array such that it's available over the lifetime of the program, and is subject to the same conversion rule as any other array expression; that is, except when it is the operand of the sizeof or unary & operators or is a string literal being used to initialize an array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", and the value of the expression will be the address of the first element in the array. Thus, in the calls

    myFunction( mystring );
    

    and

    myFunction( "A string" );
    

    both of the arguments are expressions of array type, neither is the operand of the sizeof or unary & operators, so in both cases the expression decays to a pointer to the first element. As far as the function call is concerned, there's absolutely no difference between the two.

    So let's look at a real-word example (SLES 10, x86_64, gcc 4.1.2)

    #include <stdio.h>
    
    void myFunction( const char *str )
    {
      printf( "str = %p\n", (void *) str );
      printf( "str = %s\n", str );
    }
    
    int main( void )
    {
      static const char mystring[] = "A string";
      myFunction( mystring );
      myFunction( "A string" );
    
      return 0;
    }
    

    myFunction prints out the address and contents of both the string literal and the mystring variable. Here are the results:

    [fbgo448@n9dvap997]~/prototypes/literal: gcc -o literal -std=c99 -pedantic -Wall -Werror literal.c
    [fbgo448@n9dvap997]~/prototypes/literal: ./literal
    str = 0x400654
    str = A string
    str = 0x40065d
    str = A string
    

    Both the string literal and the mystring array are being stored in the .rodata (read-only) section of the executable:

    [fbgo448@n9dvap997]~/prototypes/literal: objdump -s literal
    ...
    Contents of section .rodata:
     40063c 01000200 73747220 3d202570 0a007374  ....str = %p..st
     40064c 72203d20 25730a00 41207374 72696e67  r = %s..A string
     40065c 00412073 7472696e 6700               .A string.
    ...
    

    The static keyword in the declaration of mystring tells the compiler that the memory for mystring should be set aside at program start and held until the program terminates. The const keyword says that memory should not be modifiable by the code. In this case, sticking it in the .rodata section makes perfect sense.

    This means that no additional memory is allocated for mystring at runtime; it's already allocated as part of the image. In this particular case, for this particular platform, there's absolutely no difference between using one or the other.

    If I don't declare mystring as static, as in

    int main( void )
    {
      const char mystring[] = "A string";
      ...
    

    then we get:

    str = 0x7fff2fe49110
    str = A string
    str = 0x400674
    str = A string
    

    meaning that only the string literal is being stored in .rodata:

    Contents of section .rodata:
     40065c 01000200 73747220 3d202570 0a007374  ....str = %p..st
     40066c 72203d20 25730a00 41207374 72696e67  r = %s..A string
     40067c 00                                   .
    

    Since it's declared local to main and not declared static, mystring is allocated with auto storage duration; in this case, that means memory will be allocated from the stack at runtime, and will be held for the duration of mystring's enclosing scope (i.e., the main function). As part of the declaration, the contents of the string literal will be copied to the array. Since it's allocated from the stack, the array is modifiable in principle, but the const keyword tells the compiler to reject any code that attempts to modify it.