If the sequential execution of instructions passes offset 65535, then the 8086 will fetch the next instruction byte from offset 0 in the same code segment.
Next .COM program uses this fact and continually stitches its entire code (32 bytes in total) to its own tail, wrapping around in the 64KB code segment. You could call this a binary quine.
ORG 256 ; .COM programs start with CS=DS=ES=SS
Begin:
mov ax, cs ; 2 Providing an exterior stack
add ax, 4096 ; 3
mov ss, ax ; 2
mov sp, 256 ; 3
cli ; 1
call Next ; 3 This gets encoded with a relative offset
Next:
pop bx ; 1 -> BX is current address of Next
sub bx, 14 ; 3 -> BX is current address of Begin
More:
mov al, [bx] ; 2
mov [bx+32], al ; 3
inc bx
test bx, 31 ; 4
jnz More ; 2
nop ; 1
nop ; 1
nop ; 1
For the benefit of the call
and pop
instructions, will the program set up a small stack exterior to the code segment. I don't think the cli
is really necessary because we do have a stack.
Once we have calculated the address of the current start of our 32-byte program, we copy it 32 bytes higher in memory. All the BX
pointer arithmetic will wraparound.
We then fall through in the newly written code.
If the sequential execution of instructions passes offset 65535, then the 80386 will trigger exception 13.
Assuming that I include the necessary setup for an exception handler, would it be enough to just execute a far jump to the start of this code segment (where the newly written code sits waiting)? And would such a solution remain valid on post 80386 CPU's?
Related: Is it possible to make an assembly program that writes itself forever?
In 16-bit mode (real or protect), the IP
register will wrap around 64KiB without any fault, granted that no instruction crosses the 64KiB boundary (e.g. a two bytes instruction placed at 0xffff
).
A crossing instruction will fault on an 80386+, not sure what will happen on previous models (read the next byte in the linear address space? read the next byte from 0?).
Note that this works because the segment limit is the same as the IP
register "limit".
In 16-bit protected mode you can set a segment limit less than 64KiB, in that case, the execution will fault when reaching the end.
In short (and figuratively), the CPU makes sure all the bytes it needs are within the segment limit and then will increment the program counter without overflow detection.
So your program should work.
It's probably a bit of a stretch to call it a quine because it's reading its own machine code and that's cheating (just like reading the source code file is for high-level languages).
I haven't tested it, but a very minimal example of a program "kind of replicating" itself could be:
;Setup (assuming ES=CS)
mov al, 0abh ;This encodes stosb
mov di, _next ;Where to start writing the instruction stream
stosb ;Let's roll
_next:
This is also not a quine because only the stosb
is replicated.
Making a quine is hard, the stores must be instructions whose encoding is less than the size of the data stored or we will always have more bytes to write than those written.