Search code examples
macosassemblyarm64memory-alignmentgnu-assembler

Align assembly to end of 4K block


I have some assembly. I'd like it to be at the end of a 4K block. Currently the section is being put at 0x1000003C0, I'd like it to be located at 0x100003F80.

I tried using p2align but it seems it didn't put it at the end of the 4K block.


Solution

  • You can do this if you know a previous 4k-alignment point, but the tools don't make it easy to avoid wasting a huge amount of space.

    .balign  4096
    pagestart:            // a page-aligned reference at some earlier point.
            nop
            .skip 5680
            nop           // some arbitrary amount of code after it, perhaps more than a page.
    
     .skip 4096 - (. - pagestart) % 4096 - blocksize    // pad to blocksize before end of page
    blockstart:
            add x1, x1, x2
            add x2, x2, x3
    // 4k boundary here
    blockend:
    .equ blocksize, blockend - blockstart
            nop         // more code
    

    clang -target arm64 -c foo.s && llvm-objdump -d foo.o

    foo.o:  file format elf64-littleaarch64     (I'm on GNU/Linux, not MacOS)
    
    foo.o:  file format elf64-littleaarch64
    
    Disassembly of section .text:
    
    0000000000000000 <pagestart>:
           0: 1f 20 03 d5   nop
    
    0000000000000004 <$d.1>:              // placeholder for actual code
           4:       00 00 00 00     .word   0x00000000
           8:       00 00 00 00     .word   0x00000000
           ...
        1630:       00 00 00 00     .word   0x00000000
    
    0000000000001634 <$x.2>:
        1634: 1f 20 03 d5   nop         // end of actual code
    
    0000000000001638 <$d.3>:            // padding for alignment of blockend
        1638:       00 00 00 00     .word   0x00000000
        ...
        1ff0:       00 00 00 00     .word   0x00000000
        1ff4:       00 00 00 00     .word   0x00000000
    
    0000000000001ff8 <blockstart>:
        1ff8: 21 00 02 8b   add     x1, x1, x2
        1ffc: 42 00 03 8b   add     x2, x2, x3
    
    0000000000002000 <blockend>:       // note 4k alignment
        2000: 1f 20 03 d5   nop
    

    So this costs 0 to 4092 bytes of padding, depending on block size. And it requires a 4k-aligned point inside this .s file; these sizes need to be assemble-time constants, not just link-time, since I don't think a relocation entry can express the % modulo. Or even without it, probably not the subtraction and variable-sized skip.

    This doesn't work for me with clang -target arm64-macos -c foo.s on Linux so I'm not sure it's usable with Mach-O64 object files. Even without the % 4096, I still get an assemble-time error from .skip 4096 - (. - pagestart) - blocksize - error: expected assembly-time absolute expression