For context I am x86 golfing.
00000005 <start>:
5: e8 25 00 00 00 call 2f <cube>
a: 50 push %eax
Multiple calls later...
0000002f <cube>:
2f: 89 c8 mov %ecx,%eax
31: f7 e9 imul %ecx
33: f7 e9 imul %ecx
35: c3 ret
call
took 5 bytes even though the offset fit into a single byte! Is there any way to write call cube
and assemble with GNU assembler and get a smaller offset? I understand 16 bit offsets could be used, but ideally I'd have a 2 byte instruction like call reg
.
There is no call rel8
, or any way to push a return address and jmp
in fewer than 5 bytes.
To come out ahead with call reg
, you need to generate a full address in a register in less than 3 bytes. Even a RIP-relative LEA doesn't help, because it only exists in rel32
form, not rel8
.
For a single call
, clearly not worth it.
If you can reuse the same function pointer register for multiple 2-byte call reg
instructions, then you come out ahead even with just 2 call
s. (5 byte mov reg, imm32
plus 2x 2-byte call reg
is a total of 9 bytes, vs. 10 for 2x 5-byte call
). But it does cost you a register.
Most OSes don't let you map anything in the lowest pages (so NULL-pointer deref faults), so usable addresses are larger than 16 bits in 32 or 64-bit mode. 66 E8 rel16
(4 byte callw
) isn't an option even in 32-bit mode; that would truncate EIP to IP. https://www.felixcloutier.com/x86/call
In 32-bit / 64-bit code, I'd consider the linker options necessary to get your code mapped in the zero page as part of the byte-count of your code-golf answer. (And also the /proc/sys/vm/mmap_min_addr
kernel setting, or equivalent on other OSes) Normally we justify not counting the ELF metadata at all in code-golf, only bytes of the .text
section, so special linker tricks opens up a can of worms there.
Generally avoid call
in code-golf if you can. It's usually better to structure your loops to avoid needing code-reuse. e.g. jmp
into the middle of a loop to get part of the loop to run the right number of times, instead of calling a block multiple times.
I guess I usually look at code-golf questions which lend themselves naturally to machine code, and can avoid needing the same block of code from multiple places. I can already spend hours tweaking a short function, so starting an answer to a question that will take more code (and thus have even more room for optimization between / across parts of it) is rare for me.