Search code examples
cgccbinaryhexnotation

Why does gcc produce different compiled binaries for programs that use different forms of integer literals?


I was wondering what the difference between:

int a = 0b00000100;
int a = 0x04;
int a = 4;

When compiled with gcc.

I seem to get a different binary when compiling with what seems to be the same number, just in different notations. When I run objdump on it however, there doesn't seem to be any differences. Could somebody tell me what's going on?

This is my output:

[email protected]:[~]: cat testbin.c && echo && cat testbin2.c
#include "stdio.h"
int main () {
  int a = 0b00000100;
  int b = 0x05;
  int c = 6;
  printf("%d - %d - %d\n", a, b, c);
  return (0);
}

#include "stdio.h"
int main () {
  int a = 4;
  int b = 5;
  int c = 6;
  printf("%d - %d - %d\n", a, b, c);
  return (0);
}
[email protected]:[~]: gcc testbin.c -o testbin
[email protected]:[~]: gcc testbin2.c -o testbin2
[email protected]:[~]: md5sum testbin testbin2
fd6aaa31bdf685ea9444e1edc209565e  testbin
3a3fc241bfc2917ee29999b5befecd2a  testbin2
[email protected]:[~]: objdump -d testbin > testbin.obj && objdump -d testbin2 > testbin2.obj
[email protected]:[~]: diff testbin.obj testbin2.obj
2c2
< testbin:     file format elf64-x86-64
---
> testbin2:     file format elf64-x86-64
[email protected]:[~]: gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/6/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 6.3.0-18' --with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-6 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-6-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 6.3.0 20170516 (Debian 6.3.0-18)
[email protected]:[~]:

Notice that the executables are different, they have different hashes, but objdump -d doesn't show anything different.


Solution

  • I think that the issue has nothing to do with the integer formats and everything to do with the filenames.

    I compiled the following program twice, first using the filename FIRST_PROG.c and executable name COMPILED_1 and the second time using the filename SECOND_PROC.c and executable name COMPILED_2 using gcc with no other flags set:

    int main() {
        return 0;
    }
    

    If you hd the contents of the generated executable, at a certain offset you see this:

    00001720  66 72 61 6d 65 5f 64 75  6d 6d 79 5f 69 6e 69 74  |frame_dummy_init|
    00001730  5f 61 72 72 61 79 5f 65  6e 74 72 79 00 46 49 52  |_array_entry.FIR|
    00001740  53 54 5f 50 52 4f 47 2e  63 00 5f 5f 46 52 41 4d  |ST_PROG.c.__FRAM|
    

    Notice that the name of the source file, FIRST_PROG.c, is embedded into the generated executable. Looking at the same location in the second file shows this:

    00001720  66 72 61 6d 65 5f 64 75  6d 6d 79 5f 69 6e 69 74  |frame_dummy_init|
    00001730  5f 61 72 72 61 79 5f 65  6e 74 72 79 00 53 45 43  |_array_entry.SEC|
    00001740  4f 4e 44 5f 50 52 4f 47  2e 63 00 5f 5f 46 52 41  |OND_PROG.c.__FRA|
    

    You can see SECOND_PROG.c is embedded into the binary as well.

    Dumping both executables with objdump -s doesn't show this anywhere, which matches the clean diff you had from your programs. However, using readelf -a to list the contents of the executable that's generated does show this:

    Symbol table '.symtab' contains 66 entries:
       Num:    Value          Size Type    Bind   Vis      Ndx Name
         0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
         1: 0000000000400238     0 SECTION LOCAL  DEFAULT    1 
         2: 0000000000400254     0 SECTION LOCAL  DEFAULT    2 
         3: 0000000000400274     0 SECTION LOCAL  DEFAULT    3 
         4: 0000000000400298     0 SECTION LOCAL  DEFAULT    4 
         5: 00000000004002b8     0 SECTION LOCAL  DEFAULT    5 
         6: 0000000000400300     0 SECTION LOCAL  DEFAULT    6 
         7: 0000000000400338     0 SECTION LOCAL  DEFAULT    7 
         8: 0000000000400340     0 SECTION LOCAL  DEFAULT    8 
         9: 0000000000400360     0 SECTION LOCAL  DEFAULT    9 
        10: 0000000000400378     0 SECTION LOCAL  DEFAULT   10 
        11: 0000000000400390     0 SECTION LOCAL  DEFAULT   11 
        12: 00000000004003b0     0 SECTION LOCAL  DEFAULT   12 
        13: 00000000004003d0     0 SECTION LOCAL  DEFAULT   13 
        14: 00000000004003e0     0 SECTION LOCAL  DEFAULT   14 
        15: 0000000000400564     0 SECTION LOCAL  DEFAULT   15 
        16: 0000000000400570     0 SECTION LOCAL  DEFAULT   16 
        17: 0000000000400574     0 SECTION LOCAL  DEFAULT   17 
        18: 00000000004005a8     0 SECTION LOCAL  DEFAULT   18 
        19: 0000000000600e10     0 SECTION LOCAL  DEFAULT   19 
        20: 0000000000600e18     0 SECTION LOCAL  DEFAULT   20 
        21: 0000000000600e20     0 SECTION LOCAL  DEFAULT   21 
        22: 0000000000600e28     0 SECTION LOCAL  DEFAULT   22 
        23: 0000000000600ff8     0 SECTION LOCAL  DEFAULT   23 
        24: 0000000000601000     0 SECTION LOCAL  DEFAULT   24 
        25: 0000000000601020     0 SECTION LOCAL  DEFAULT   25 
        26: 0000000000601030     0 SECTION LOCAL  DEFAULT   26 
        27: 0000000000000000     0 SECTION LOCAL  DEFAULT   27 
        28: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c
        29: 0000000000600e20     0 OBJECT  LOCAL  DEFAULT   21 __JCR_LIST__
        30: 0000000000400410     0 FUNC    LOCAL  DEFAULT   14 deregister_tm_clones
        31: 0000000000400450     0 FUNC    LOCAL  DEFAULT   14 register_tm_clones
        32: 0000000000400490     0 FUNC    LOCAL  DEFAULT   14 __do_global_dtors_aux
        33: 0000000000601030     1 OBJECT  LOCAL  DEFAULT   26 completed.7585
        34: 0000000000600e18     0 OBJECT  LOCAL  DEFAULT   20 __do_global_dtors_aux_fin
        35: 00000000004004b0     0 FUNC    LOCAL  DEFAULT   14 frame_dummy
        36: 0000000000600e10     0 OBJECT  LOCAL  DEFAULT   19 __frame_dummy_init_array_
        37: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS FIRST_PROG.c
        38: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c
        39: 0000000000400698     0 OBJECT  LOCAL  DEFAULT   18 __FRAME_END__
        40: 0000000000600e20     0 OBJECT  LOCAL  DEFAULT   21 __JCR_END__
        41: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS 
        42: 0000000000600e18     0 NOTYPE  LOCAL  DEFAULT   19 __init_array_end
        43: 0000000000600e28     0 OBJECT  LOCAL  DEFAULT   22 _DYNAMIC
        44: 0000000000600e10     0 NOTYPE  LOCAL  DEFAULT   19 __init_array_start
        45: 0000000000400574     0 NOTYPE  LOCAL  DEFAULT   17 __GNU_EH_FRAME_HDR
        46: 0000000000601000     0 OBJECT  LOCAL  DEFAULT   24 _GLOBAL_OFFSET_TABLE_
        47: 0000000000400560     2 FUNC    GLOBAL DEFAULT   14 __libc_csu_fini
        48: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterTMCloneTab
        49: 0000000000601020     0 NOTYPE  WEAK   DEFAULT   25 data_start
        50: 0000000000601030     0 NOTYPE  GLOBAL DEFAULT   25 _edata
        51: 0000000000400564     0 FUNC    GLOBAL DEFAULT   15 _fini
        52: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@@GLIBC_
        53: 0000000000601020     0 NOTYPE  GLOBAL DEFAULT   25 __data_start
        54: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
        55: 0000000000601028     0 OBJECT  GLOBAL HIDDEN    25 __dso_handle
        56: 0000000000400570     4 OBJECT  GLOBAL DEFAULT   16 _IO_stdin_used
        57: 00000000004004f0   101 FUNC    GLOBAL DEFAULT   14 __libc_csu_init
        58: 0000000000601038     0 NOTYPE  GLOBAL DEFAULT   26 _end
        59: 00000000004003e0    42 FUNC    GLOBAL DEFAULT   14 _start
        60: 0000000000601030     0 NOTYPE  GLOBAL DEFAULT   26 __bss_start
        61: 00000000004004d6    11 FUNC    GLOBAL DEFAULT   14 main
        62: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _Jv_RegisterClasses
        63: 0000000000601030     0 OBJECT  GLOBAL HIDDEN    25 __TMC_END__
        64: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMCloneTable
        65: 0000000000400390     0 FUNC    GLOBAL DEFAULT   11 _init
    

    Notice that entry 37 contains the name of the source file. If you try diffing the output of readelf -a, you do get some pretty helpful information:

    81c81
    <   [28] .shstrtab         STRTAB           0000000000000000  0000189f
    ---
    >   [28] .shstrtab         STRTAB           0000000000000000  000018a0
    86c86
    <        0000000000000207  0000000000000000           0     0     1
    ---
    >        0000000000000208  0000000000000000           0     0     1
    211c211
    <     37: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS FIRST_PROG.c
    ---
    >     37: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS SECOND_PROG.c
    258c258
    <     Build ID: 2c64961288049002e34a1f14e55d6c80dd96816c
    ---
    >     Build ID: 5425dec81aae53bd30e85fe94659d320bb774dcc
    

    It seems like many of these differences boil down to just having a different name for the source file.

    So my official answer is "this has nothing whatsoever to do with integer literals and is purely a function of compiling files with different names."