Which type of assembler jump instruction is most useful?

I am working on a hobby project to design a small CPU and a microcomputer system around it. I have a basic dataflow and have made some decisions about wordsize and the number of instructions in the ISA. I have decided on a 16-bit word with a 4-bit opcode field. This allows for 16 instructions.

EDIT : Some additional information

The instruction word, 16-bit, is formatted as follows : OOOOPPPPPPPPPPPP

O-bits are opcode, P-bits are the payload, address/data. This allows for 4096 kilowords of address space and 16 opcodes. The restriction to 16 opcodes is a design decision to make the hardware implementation easier.

The architecture has one accumulator (A-register), a B-register and a stack pointer.

/ENDEDIT

The ISA has move and load instructions(Load/store), basic arithmetic and logic (add, sub, NAND), basic stack functions (push and pop). There is also a subroutine (unconditional) jump/return pair of instructions. With all this there is still one piece missing : conditional jumps.

The return function can be the same as for unconditional jump, so what I need is a test of some kind and a jump on condition of that test.

So, here is my question : what is the most USEFUL kind of test/conditional pair to have?

There is room for ONE test and ONE conditional jump left in the opcode table, or TWO conditional jumps if you somehow include the condition test in other operations, like ADD or SUB.

Jump on overflow, not Zero, Zero, carry, not carry are some of my ideas, but I am unsure which one(s) would be more useful. Any suggestions are appreciated.

Solution

(I've been coding assembler for 40+ years; actually designed and built a production multi-register 16 bit machine back in the early 1970s). What would be really useful is a CMP instruction, and a JMP relative that specifies that condition.

I suggest you let the arithmetic instructions produce status bits for

Zero result
Carry from result
Sign of result
(Signed) Overflow

We'll call these "condition bits". You'll discover they are all pretty useful.

Your compare instruction should basically do a subtract, and throw the answer away, setting the condition bits.

Your JMP instruction should use one of the 16 opcodes, a 3 bit condition-selector, and a 9 bit relative offset from the PC, e.g., a jump-relative-conditional to the PC. Short relative branches are really useful in code.

The 3 bit condition selector should use two bits to select one of the 4 conditions, and one bit to select "invert". This way you can have "jmp zero", "jmp not zero", etc.

You might treat "jump no overflow" instead as "jump always"; these a very handy in code. (I don't understand how you think "jmp unconditional" and "return from subroutine" can be the same instruction).

From comments in the discussion, it seem you don't have any registers. That's a serious mistake from the point of view of code compactness (esp. with 16 bit instructions), and performance (registers are invariably faster to access than memory). I'd use 2 or 3 or the 12 bits left over after the opcode to name registers.

Aha: in an edit, OP notes he has A and B registers, probably selected implicitly by opcode. The machine I designed was inspired by the PDP-11, but had 16 base opcodes with 8 registers (leaving 9 bits of "operand") and it really made a difference in coding. Most of the instructions broke those 9 bits into addressing modes and offsets: direct (uses a 2nd word as the address), immediate (2nd word), immediate (7 bits, meant the machine could process ASCII text easily), indirect 2nd register auto-increment, indirect 2nd register auto decrement, short-offset relative to 2nd register, and 2nd-word offset relative to register. It did not have a stack, but you can implement one trivially using the auto-inc/decrement instructions so no loss. It was a pleasure to code for. (It also had virtual memory with traps, and a hardware context switch between register sets and VM mappings).

While all this sounds like it requires complex logic to decode, it does not. We managed to implement this machine with ~~ 120 so-called "medium scale" chips (4 gates per chip, multiplexors, 4 bit adder slice, etc.). These days I'd expect you to be able to implement the whole thing in a single FPGA.