I'm trying to understand some of the intricacies the preprocessor and of the C compiler (specifically, the gnu gcc) and string literals. Is it more efficient to just assign a global variable for a string literal that occupies only one place in memory vs using a #define
preprocessor directive?
As in this example, the string literal is on place in memory and accessed several times:
#include <stdio.h>
#include <string.h>
char OUTPUT[20] = "Hello, world!!!";
int main (){
printf("%s is %d characters long.\n", OUTPUT, strlen(OUTPUT));
return 0;
}
vs doing it with the preprocessor:
#include <stdio.h>
#include <string.h>
#define OUTPUT "Hello, world!!!"
int main (){
printf("%s is %d characters long.\n", OUTPUT, (int) strlen(OUTPUT));
return 0;
}
which translates as:
#include <stdio.h>
#include <string.h>
#define OUTPUT "Hello, world!!!"
int main (){
printf("%s is %d characters long.\n", "Hello, world!!!", (int) strlen("Hello, world!!!"));
return 0;
}
What I'm really asking is in the last two examples example using the preprocessor, does the compiler have two separate instances of "Hello, world!!!" in two separate memory locations or is the compiler smart enough to make it one memory location?
If it is two separate memory locations, then isn't it more resource-friendly to use a global variable rather than macro expansion for program constants?
Your compiler should be smart enough to store one instance of the string. You can verify this by checking the assembly outputs for your programs.
For example, using GCC:
Assume your first example is called "global.c".
gcc -Wall -S global.c
.file "global.c"
.globl OUTPUT
.data
.align 16
.type OUTPUT, @object
.size OUTPUT, 20
OUTPUT:
.string "Hello, world!!!"
.zero 4
.section .rodata
.LC0:
.string "%s is %d characters long.\n"
.text
.globl main
.type main, @function
main:
// More code...
Assume your preprocessor example is called "preproc.c".
gcc -Wall -S preproc.c
.file "preproc.c"
.section .rodata
.LC0:
.string "%s is %d characters long.\n"
.LC1:
.string "Hello, world!!!"
.text
.globl main
.type main, @function
main:
// More code...
In both cases, only one copy of "Hello, world!!!" and "%s is %d characters long.\n" exist. In the first example, you have to save space for 20 characters because your code has a modifiable array. If you changed this
char OUTPUT[20] = "Hello, world!!!";
to
const char * const OUTPUT = "Hello, world!!!";
You would get:
.file "global.c"
.globl OUTPUT
.section .rodata
.LC0:
.string "Hello, world!!!"
.align 8
.type OUTPUT, @object
.size OUTPUT, 8
OUTPUT:
.quad .LC0
.LC1:
.string "%s is %d characters long.\n"
.text
.globl main
.type main, @function
main:
// More code...
Now you are saving space for just the pointer and the string.
Which way is better is negligible in this situation, though I would recommend using the preprocessor so that the scope of your strings stays within the main function.
Both emit almost identical code with optimizations.
Global.c with (const char * const OUTPUT
):
gcc -Wall -O3 -S global.c
.file "global.c"
.section .rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "Hello, world!!!"
.LC1:
.string "%s is %d characters long.\n"
.section .text.startup,"ax",@progbits
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB44:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $15, %ecx
movl $.LC0, %edx
movl $.LC1, %esi
movl $1, %edi
xorl %eax, %eax
call __printf_chk
xorl %eax, %eax
addq $8, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE44:
.size main, .-main
.globl OUTPUT
.section .rodata
.align 8
.type OUTPUT, @object
.size OUTPUT, 8
OUTPUT:
.quad .LC0
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
Preproc.c
gcc -Wall -O3 -S preproc.c
.file "preproc.c"
.section .rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "Hello, world!!!"
.LC1:
.string "%s is %d characters long.\n"
.section .text.startup,"ax",@progbits
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB44:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $15, %ecx
movl $.LC0, %edx
movl $.LC1, %esi
movl $1, %edi
xorl %eax, %eax
call __printf_chk
xorl %eax, %eax
addq $8, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE44:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
Looking at both main
functions, you can see that the instructions are identical.