Search code examples
sizebinutilsobject-filessysv

Convert binutils `size` output from "sysv" format (`size --format=sysv my_executable`) to "berkeley" format (`size --format=berkeley my_executable`)


I'd like to know how to get this berkeley format output:

$ size --format=berkeley /bin/ls
   text    data     bss     dec     hex filename
 124042    4728    4832  133602   209e2 /bin/ls

From this sysv format output:

$ size --format=sysv /bin/ls
/bin/ls  :
section                size      addr
.interp                  28       568
.note.ABI-tag            32       596
.note.gnu.build-id       36       628
.gnu.hash               236       664
.dynsym                3576       904
.dynstr                1666      4480
.gnu.version            298      6146
.gnu.version_r          112      6448
.rela.dyn              4944      6560
.rela.plt              2664     11504
.init                    23     14168
.plt                   1792     14192
.plt.got                 24     15984
.text                 74969     16016
.fini                     9     90988
.rodata               19997     91008
.eh_frame_hdr          2180    111008
.eh_frame             11456    113192
.init_array               8   2224112
.fini_array               8   2224120
.data.rel.ro           2616   2224128
.dynamic                512   2226744
.got                    968   2227256
.data                   616   2228224
.bss                   4832   2228864
.gnu_debuglink           52         0
Total                133654

In other words, which of the little parts (sections) of the "sysv" format go into which of the big parts (text, data, and bss sections), of the "berkeley" format?

I'm trying to guess here by seeing what sums to what.

In other words, I'd like to know:

  • ? + ? + ? = text
  • ? + ? + ? = data
  • ? + ? + ? = bss

Related:

  1. [my question] https://electronics.stackexchange.com/questions/363931/how-do-i-find-out-at-compile-time-how-much-of-an-stm32s-flash-memory-and-dynami

Solution

  • Here's the answer:

    TLDR;

    .interp + .note.ABI-tag + .note.gnu.build-id + .gnu.hash + .dynsym + .dynstr 
    + .gnu.version + .gnu.version_r + .rela.dyn + .rela.plt + .init + .plt 
    + .plt.got + .text + .fini + .rodata + .eh_frame_hdr + .eh_frame
    = text
    
    .init_array + .fini_array + .data.rel.ro + .dynamic + .got + .data
    = data
    
    .bss = bss
    

    See also the image with yellow, blue, and red boxes at the end, for a quick visual summary.

    Details:

    First, let's print the berkeley size information in hex, with size -x --format=berkeley /bin/ls or size -x /bin/ls (same thing, since berkeley is the default format):

    $ size -x /bin/ls
       text    data     bss     dec     hex filename
    0x1e48a  0x1278  0x12e0  133602   209e2 /bin/ls
    

    And here's the sysv size output in hex, obtained with size -x --format=sysv /bin/ls:

    $ size -x --format=sysv /bin/ls
    /bin/ls  :
    section                 size       addr
    .interp                 0x1c      0x238
    .note.ABI-tag           0x20      0x254
    .note.gnu.build-id      0x24      0x274
    .gnu.hash               0xec      0x298
    .dynsym                0xdf8      0x388
    .dynstr                0x682     0x1180
    .gnu.version           0x12a     0x1802
    .gnu.version_r          0x70     0x1930
    .rela.dyn             0x1350     0x19a0
    .rela.plt              0xa68     0x2cf0
    .init                   0x17     0x3758
    .plt                   0x700     0x3770
    .plt.got                0x18     0x3e70
    .text                0x124d9     0x3e90
    .fini                    0x9    0x1636c
    .rodata               0x4e1d    0x16380
    .eh_frame_hdr          0x884    0x1b1a0
    .eh_frame             0x2cc0    0x1ba28
    .init_array              0x8   0x21eff0
    .fini_array              0x8   0x21eff8
    .data.rel.ro           0xa38   0x21f000
    .dynamic               0x200   0x21fa38
    .got                   0x3c8   0x21fc38
    .data                  0x268   0x220000
    .bss                  0x12e0   0x220280
    .gnu_debuglink          0x34        0x0
    Total                0x20a16
    

    Next, if you run objdump -h /bin/ls, you get the following, which shows all output sections in the /bin/ls object file, or executable. These output sections match the output from the size -x --format=sysv /bin/ls command, but have more-detailed information such as the VMA (Virtual Memory Address) and LMA (Load Memory Address), among other things:

    $ objdump -h /bin/ls
    
    /bin/ls:     file format elf64-x86-64
    
    Sections:
    Idx Name          Size      VMA               LMA               File off  Algn
      0 .interp       0000001c  0000000000000238  0000000000000238  00000238  2**0
                      CONTENTS, ALLOC, LOAD, READONLY, DATA
      1 .note.ABI-tag 00000020  0000000000000254  0000000000000254  00000254  2**2
                      CONTENTS, ALLOC, LOAD, READONLY, DATA
      2 .note.gnu.build-id 00000024  0000000000000274  0000000000000274  00000274  2**2
                      CONTENTS, ALLOC, LOAD, READONLY, DATA
      3 .gnu.hash     000000ec  0000000000000298  0000000000000298  00000298  2**3
                      CONTENTS, ALLOC, LOAD, READONLY, DATA
      4 .dynsym       00000df8  0000000000000388  0000000000000388  00000388  2**3
                      CONTENTS, ALLOC, LOAD, READONLY, DATA
      5 .dynstr       00000682  0000000000001180  0000000000001180  00001180  2**0
                      CONTENTS, ALLOC, LOAD, READONLY, DATA
      6 .gnu.version  0000012a  0000000000001802  0000000000001802  00001802  2**1
                      CONTENTS, ALLOC, LOAD, READONLY, DATA
      7 .gnu.version_r 00000070  0000000000001930  0000000000001930  00001930  2**3
                      CONTENTS, ALLOC, LOAD, READONLY, DATA
      8 .rela.dyn     00001350  00000000000019a0  00000000000019a0  000019a0  2**3
                      CONTENTS, ALLOC, LOAD, READONLY, DATA
      9 .rela.plt     00000a68  0000000000002cf0  0000000000002cf0  00002cf0  2**3
                      CONTENTS, ALLOC, LOAD, READONLY, DATA
     10 .init         00000017  0000000000003758  0000000000003758  00003758  2**2
                      CONTENTS, ALLOC, LOAD, READONLY, CODE
     11 .plt          00000700  0000000000003770  0000000000003770  00003770  2**4
                      CONTENTS, ALLOC, LOAD, READONLY, CODE
     12 .plt.got      00000018  0000000000003e70  0000000000003e70  00003e70  2**3
                      CONTENTS, ALLOC, LOAD, READONLY, CODE
     13 .text         000124d9  0000000000003e90  0000000000003e90  00003e90  2**4
                      CONTENTS, ALLOC, LOAD, READONLY, CODE
     14 .fini         00000009  000000000001636c  000000000001636c  0001636c  2**2
                      CONTENTS, ALLOC, LOAD, READONLY, CODE
     15 .rodata       00004e1d  0000000000016380  0000000000016380  00016380  2**5
                      CONTENTS, ALLOC, LOAD, READONLY, DATA
     16 .eh_frame_hdr 00000884  000000000001b1a0  000000000001b1a0  0001b1a0  2**2
                      CONTENTS, ALLOC, LOAD, READONLY, DATA
     17 .eh_frame     00002cc0  000000000001ba28  000000000001ba28  0001ba28  2**3
                      CONTENTS, ALLOC, LOAD, READONLY, DATA
     18 .init_array   00000008  000000000021eff0  000000000021eff0  0001eff0  2**3
                      CONTENTS, ALLOC, LOAD, DATA
     19 .fini_array   00000008  000000000021eff8  000000000021eff8  0001eff8  2**3
                      CONTENTS, ALLOC, LOAD, DATA
     20 .data.rel.ro  00000a38  000000000021f000  000000000021f000  0001f000  2**5
                      CONTENTS, ALLOC, LOAD, DATA
     21 .dynamic      00000200  000000000021fa38  000000000021fa38  0001fa38  2**3
                      CONTENTS, ALLOC, LOAD, DATA
     22 .got          000003c8  000000000021fc38  000000000021fc38  0001fc38  2**3
                      CONTENTS, ALLOC, LOAD, DATA
     23 .data         00000268  0000000000220000  0000000000220000  00020000  2**5
                      CONTENTS, ALLOC, LOAD, DATA
     24 .bss          000012e0  0000000000220280  0000000000220280  00020268  2**5
                      ALLOC
     25 .gnu_debuglink 00000034  0000000000000000  0000000000000000  00020268  2**2
                      CONTENTS, READONLY
    

    A Google search for "vma and lma meaning" brings me to this site, which has a useful quote from the GNU ld linker manual. Searching for that quote leads me here, which conveniently has the source for the quote. So let's just cite the quote directly from its original source:

    Every loadable or allocatable output section has two addresses. The first is the VMA, or virtual memory address. This is the address the section will have when the output file is run. The second is the LMA, or load memory address. This is the address at which the section will be loaded. In most cases the two addresses will be the same. An example of when they might be different is when a data section is loaded into ROM, and then copied into RAM when the program starts up (this technique is often used to initialize global variables in a ROM based system). In this case the ROM address would be the LMA, and the RAM address would be the VMA.

    You can see the sections in an object file by using the objdump program with the ‘-h’ option.

    (Source: GNU linker script ld manual)

    This means that any output section shown by objdump -h which does NOT have a VMA is not part of the program. That eliminates the .gnu_debuglink section.

    Next, we can see that the .bss section has the exact same size (0x12e0) as the berkeley bss section, so that's a match:

    .bss = bss
    

    bss contains the zero-initialized global and static variables.

    So, what about the data output section, which contains all NON-zero-initialized (ie: initialized with some non-zero value) global and static variables?

    And, what about the text output section, which contains all program code and constant (read only) static and global variables?

    Well, through logical deduction and analysis, and using my prior knowledge about which sections go into Flash vs RAM vs both on microcontrollers, I determined that all sections which are marked READONLY in the objdump -h output sections (which contains some DATA (non-zero-initialized, const (read-only) static and global variables) and some CODE (the actual program logic) (also read-only)) are stored into the text output section.

    So:

    .interp + .note.ABI-tag + .note.gnu.build-id + .gnu.hash + .dynsym + .dynstr 
    + .gnu.version + .gnu.version_r + .rela.dyn + .rela.plt + .init + .plt 
    + .plt.got + .text + .fini + .rodata + .eh_frame_hdr + .eh_frame
    = text
    

    You can confirm that in the math by summing all their sizes. In hex:

    1c + 20 + 24 + ec + df8 + 682 + 12a + 70 + 1350 + a68 + 17 + 700 + 18 + 124d9 + 9 + 4e1d 
    + 884 + 2cc0 = 1e48a
    

    ...which is the size of the text section shown in the berkeley size output.

    You can see them boxed in yellow in the image below.

    So, the remainder, which are marked DATA and NOT READONLY, are the data sections:

    .init_array + .fini_array + .data.rel.ro + .dynamic + .got + .data
    = data
    

    Again, the hex size summation confirms this:

    8 + 8 + a38 + 200 + 3c8 + 268 = 1278
    

    ...which is the size of the data section in the berkeley size output.

    You can see them boxed in blue in the image below.

    In this image, you can see all 3 berkely output sections boxed in different colors:

    1. The berkeley-format text output sections (read-only, program logic and const static and global variables) are boxed in yellow.
    2. The berkeley-format data output sections (non-zero-initialized [ie: other-than-zero initialized] static and global variables) are boxed in blue.
    3. The berkeley-format bss output sections (zero-initialized static and global variables) are boxed in red.

    In the case of looking at a microcontroller object file, such as for an STM32 mcu:

    1. Flash memory usage = text + data, and
    2. RAM memory usage from static and global variables = bss + data.
      1. That means the RAM left over for stack (local variables) and heap (dynamic memory allocation) = RAM_total - (bss + data).

    enter image description here

    Primary References:

    1. GNU Linker (ld) manual, section "3.1 Basic Linker Script Concepts": https://sourceware.org/binutils/docs/ld/Basic-Script-Concepts.html#Basic-Script-Concepts
    2. [my own question here] https://electronics.stackexchange.com/questions/363931/how-do-i-find-out-at-compile-time-how-much-of-an-stm32s-flash-memory-and-dynami