Search code examples
assemblynasm

Can the _start symbol in the assembly be replaced with another word?


Two days ago I started learning assembly and I could not find these questions on the internet, I would be glad if you could help. I learned that the starting point of the program must be specified as global _start. I have two questions. First of all, in all the codes I have seen, the global _start part was written inside the text section part. Is it possible to write the global _start part outside the text section? My second question is, can the _start part in the global _start be changed? So if I type global _asd or global qwe for defining the starting point of the program, will I get a syntax error?

Note: I'm currently on a Linux Ubuntu. I'm using nasm tool as assembler and ld as linker.


Solution

  • This is a gnu ld question not nasm. When ld links it is looking for that symbol to mark as the entry point. Your question is vague as to the target, but stating nasm indicates x86 and of course Linux is not vague.

    So since you are loading the program being built from an operating system like Linux the entry point is critical, unless of course you manipulate the binary in some way or indicate to the linker in some way what your entry point is. Your program will not operate properly and quite likely simply crash, if the program is not executed in the proper order, you can't just jump into the middle of a program and hope for success, much less try to execute beginning with .data or something not code.

    Now as mentioned in comments (up vote the comments please) you can change the entry point label if you don't want to use the _start label. If you do not specify _start, ld will give a warning and continue, but if you don't give it another label then you are at risk of it entering in the wrong place.

    If this were bare-metal for a microcontroller for example then you don't have an operating system loading the program into memory and entering anywhere in the binary that you specify, you are instead governed by the hardware/logic and have to conform to its rules and craft the code, linker script, command line, etc to generate the binary to match the logic specified entry point, and in that case you can go without the _start all together, take whatever default ld puts in its output binary which is then at some point used to program the flash/rom in the mcu (stripping all of that knowledge from the binary file including the entry point).

    I am not so sure about nasm, but assume you are always in some section, so the label will land somewhere. If it is not in a .text section and you are using it as the entry point (by default, by not specifying something else). Even if it is the last line before a .text section declaration, the linker is going to put that label with the other labels in the section it lands, so because it is in the file just before a .text declaration rather than just after let's say, it may land with an address that is nowhere near the code that follows in the source file.

    Some examples, using gnu tools, the question is ld specific so the target and assembler don't necessarily matter here.

    MEMORY
    {
        one   : ORIGIN = 0x1000, LENGTH = 0x1000
        two   : ORIGIN = 0x2000, LENGTH = 0x1000
        three : ORIGIN = 0x3000, LENGTH = 0x1000
    }
    SECTIONS
    {
        .text   : { *(.text*)   } > one
        .data   : { *(.data*)   } > two
        .bss    : { *(.bss*)    } > three
    }
    
    .globl _start
    _start:
        nop
    

    Building and use readelf

      Entry point address:               0x1000
    

    Now if I

    .globl here
    here:
        nop
    
    .globl _start
    _start:
        nop
    
    .globl there
    there:
        nop
    
    
    00001000 <here>:
        1000:   e1a00000    nop         ; (mov r0, r0)
    
    00001004 <_start>:
        1004:   e1a00000    nop         ; (mov r0, r0)
    
    00001008 <there>:
        1008:   e1a00000    nop         ; (mov r0, r0)
    
      Entry point address:               0x1000
    

    And that may be confusing... but let's move on.

    arm-linux-gnueabi-ld -nostdlib -nostartfiles -e _start -T so.ld so.o -o so.elf
    
      Entry point address:               0x1004
    

    Or instead

    ENTRY(_start)
    MEMORY
    {
        one   : ORIGIN = 0x1000, LENGTH = 0x1000
    ...
    
    
      Entry point address:               0x1004
    

    But I can also do this:

        .globl here
        here:
            nop
        
            nop
        
        .globl there
        there:
            nop
    
    ENTRY(there)
    MEMORY
    {
        one   : ORIGIN = 0x1000, LENGTH = 0x1000
    
      Entry point address:               0x1008
    

    Noting that the linker didn't warn about _start

    If I now remove ENTRY() from the linker script.

      Entry point address:               0x1000
    

    But if I do this:

    arm-none-eabi-ld so.o -o so.elf
    arm-none-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000008000
    

    Which means no linker script so it is going to use defaults, then it is looking for it. Which we can do ourselves with

    ENTRY(_start)
    MEMORY
    {
    

    but no defined _start global label

    arm-linux-gnueabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000001000
    

    So if you are simply doing

    nasm stuff myprog.asm stuff myprog.o
    ld myprog.o -o myprog
    

    You are using whatever default linker settings/script for the tool/environment and it likely has an ENTRY(_start) or equivalent as the default. If you are in complete control of the linker and you want to load a program into Linux then you need a safe/sane entry point for the program to work otherwise ld defaults to the beginning of the binary or beginning of .text which we can test:

    SECTIONS
    {
        .text   : { *(.text*)   } > two
        .data   : { *(.data*)   } > one
        .bss    : { *(.bss*)    } > three
    }
    
    .globl here
    here:
        nop
    
    .data
    .word 0x12345678
    
    arm-linux-gnueabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000002000
    
    
    Disassembly of section .text:
    
    00002000 <here>:
        2000:   e1a00000    nop         ; (mov r0, r0)
    
    Disassembly of section .data:
    
    00001000 <.data>:
        1000:   12345678
    

    so beginning of .text not beginning or first address space in the binary

    ENTRY(somedata)
    MEMORY
    {
        one   : ORIGIN = 0x1000, LENGTH = 0x1000
        two   : ORIGIN = 0x2000, LENGTH = 0x1000
        three : ORIGIN = 0x3000, LENGTH = 0x1000
    }
    SECTIONS
    {
        .text   : { *(.text*)   } > two
        .data   : { *(.data*)   } > one
        .bss    : { *(.bss*)    } > three
    }
    
    
    .globl here
    here:
        nop
    
    .data
    .globl somedata
    somedata: .word 0x12345678
    
      Entry point address:               0x1000
    

    This is as trivial to do with nasm and ld as demonstrated above with gas and ld. This shows that _start isn't actually magic any more than main() is with respect to ld (or even gcc). _start seems/feels magic because default linker scripts call it out, so folks think it is magic. main() is magic because the language defines it as such but in reality it is the bootstrap that makes it so and if you simply

    gcc helloworld.c -o helloworld
    

    You are getting default bootstrap and linker script. But you could make your own bootstrap or modify the one in your C library and use it and not have a main() in your program and the tools don't care it will just work fine. (not all tools of course as some tools do detect main() and add critical stuff that might not normally get added, especially for C++). But, the gnu tools are particularly flexible and generic which makes them usable for so many targets, bare-metal to kernel drivers to operating system applications.

    Use the tools you have, they are very powerful, do experiments like the above first.