I was reading up about breakpoints from a few articles such as this: https://interrupt.memfault.com/blog/cortex-m-breakpoints
Most resources mention that the processor gets halted. What does a processor halt mean? The processor would still be getting clock input right? If so, it should ideally start fetching and executing the next instruction. However, that does not happen.
So, can anyone help me understand what happens to the processor when a break point is hit?
Do not confuse the clocking of a processor or any logic and fetch and execution. There is a disconnect between each clock and each fetch and execute. All of the processor is clocked not just the actual state machine that does the fetch, decode, execute, etc. The clock feeds memory interfaces, many of the signals and registers and such are clocked, there is some form of processor bus that runs off the clock, a debug interface is clocked, etc.
Other than some low power modes we will not get into, the clock to the processor is not gated (like the gate around my yard, can be open or closed allowing people through or not)(the clock can be blocked or not). Clock gating is not used for breakpoint or halting in general, these are just inputs to the state machine that runs the processor. (google state machine or finite state machine).
You may already know from textbook processors (fetch, decode, execute, writeback, ...)(which to some extent are not exactly how they are implemented outside the classroom, it is just a textbook understanding) that the processor may stall the pipeline. There are names for the reasons to stall, but this is another example where there is a disconnect between fetch/execute and the processors clock. The processor clock will keep going but the processor does not actually fetch and/nor execute. Usually this is temporary. In an MCU like the cortex-m family described in your external reference. You normally execute instructions from flash, and you normally talk to peripherals. It is very common for the flash to be running on a clock that is half or some other multiple slower than the processor clock. And for many if you push the processor clock faster with a pll there may be rules that the peripheral clock has to be slower. And no reason to assume that a bus transaction to a peripheral at the same clock rate or slower happens in one clock, it does not. Many of the cortex-m cores fetch either 16 or 32 bits at a time per fetch (bus) cycle. If the flash is say half the clock rate of the processor and it is fetching a halfword at a time, then the processor can only fetch one instruction every two clock cycles. And thus can execute no more than one instruction every two clock cycles as well, but is slower a lot of the time. Likewise if it takes say 8 clock cycles to read the status of the uart, then that one execution state of the LDR reading that address stalls the processor at least 8 clocks. Some prefetching may be in flight as well as some decoding but the execution stage is stalled and eventually the whole processor stalls. halts and breakpoints are no different, except that the processor never leaves the "execution" state. Or at least on its own. As shown with the debugger, there are signals that the human can interact with using the debugger that can kick the processor out of that execution state into other states (perhaps fetching from a new address or just moving it out of the execution state into the next state).
So as an example let us make a simple cortex-m simulator, that only knows about a few instructions. It is not a parallel pipeline (and some of these cortex-ms have very few states in their pipe, not even enough for all the textbook states) but a serial execution, perhaps the thing you do in that college course before you move on to a parallel pipe. This could be optimized more, but it is intentionally broken into a number of states, and does to some extent one thing per clock.
Some processors implement general purpose registers such that each register is its own chunk of flip flops and multiple registers can be accessed in one clock, for demonstration purposes mine is going to be a register file, a.k.a, sram. And single ported so that only one thing can happen at time, one read or one write. So if I want to do an add r0,r1,r2, then it takes a whole clock to get the value r2 from the register file, a whole separate clock to get r1, and a whole separate clock to write r0 (after adding).
I am going to cheat a little here and there. I could go through the states it takes to do a reset, which for a cortex-m involves at a minimum reading the word at address 0x00000000 and the word at address 0x00000004 (note some folks think instructions do this, no just logic, instructions are a concept that the logic operates on just like words on this page are something that mean something to us, but are built out of individual letters of the alphabet and displayed using many pixels. These reads of memory are likely separate bus cycles just to get the stack pointer init value and the reset exception handler address. So I cheated there, also I made my memory 16 bit wide not 32 nor 64, etc. Makes the code a bit easier to read.
My program under test is
.thumb
.cpu cortex-m0
.word 0x20001000
.word reset
.thumb_func
reset:
add r1,#1
add r2,#2
add r1,r1,r2
add r1,r1,r1
add r1,#3
add r1,#4
add r1,#5
add r1,#6
bkpt
add r1,#7
add r1,#8
add r1,#9
add r1,#10
b .
And here is my state machine based processor in C
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
unsigned short mem[256];
unsigned int reg[16];
unsigned int pc;
unsigned int pc_next;
unsigned int state;
unsigned int next_state;
unsigned int alu_a;
unsigned int alu_b;
unsigned int alu_out;
unsigned int rd;
unsigned int rn;
unsigned int rm;
unsigned int imm;
unsigned short inst;
unsigned int print_breakpoint_flag;
enum
{
NONE,
FETCH,
DECODE,
ADD_IMM,
ADD_REG_A,
ADD_REG_B,
ADD_EXECUTE,
ADD_WRITEBACK,
BREAKPOINT
};
void reset ( void )
{
//normally this is assumed to be random garbage not zeros;
memset(reg,0,sizeof(reg));
//this would normally be done in the state machine as well with
//some number of clock cycles
reg[13]=mem[1];
reg[13]<<=16;
reg[13]|=mem[0];
pc_next=mem[3];
pc_next<<=16;
pc_next|=mem[2];
if((pc_next&1)==0)
{
printf("thumb only\n");
exit(1);
}
pc_next&=(~1);
printf("RESET 0x%08X 0x%08X\n",reg[13],pc_next);
next_state=FETCH;
}
void one_clock ( void )
{
state=next_state;
next_state=NONE;
switch(state)
{
case FETCH:
{
printf("FETCH 0x%08X\n",pc_next);
pc=pc_next;
pc_next=pc+2;
reg[15]=pc+4;
inst=mem[pc>>1];
next_state=DECODE;
print_breakpoint_flag=1;
break;
}
case DECODE:
{
printf("DECODE 0x%04X\n",inst);
if((inst&0xF800)==0x3000)
{
//add immediate immed8
rd=(inst>>8)&0x7;
rn=rd;
imm=(inst>>0)&0xFF;
next_state=ADD_IMM;
}
if((inst&0xFE00)==0x1800)
{
rd=(inst>>0)&7;
rn=(inst>>3)&7;
rm=(inst>>6)&7;
next_state=ADD_REG_A;
break;
}
if((inst&0xFF00)==0xBE00)
{
imm=(inst>>0)&0xFF;
next_state=BREAKPOINT;
}
break;
}
case ADD_IMM:
{
printf("ADD_IMM r%u #0x%X\n",rn,imm);
alu_a=reg[rn];
alu_b=imm;
next_state=ADD_EXECUTE;
break;
}
case ADD_EXECUTE:
{
printf("ADD_EXECUTE 0x%08X 0x%08X\n",alu_a,alu_b);
alu_out=alu_a+alu_b;
//not doing flags
next_state=ADD_WRITEBACK;
break;
}
case ADD_WRITEBACK:
{
printf("ADD_WRITEBACK r%u 0x%08X\n",rd,alu_out);
reg[rd]=alu_out;
next_state=FETCH;
break;
}
case ADD_REG_A:
{
printf("ADD_REG_A r%u\n",rn);
alu_a=reg[rn];
next_state=ADD_REG_B;
break;
}
case ADD_REG_B:
{
printf("ADD_REG_B r%u\n",rm);
alu_b=reg[rm];
next_state=ADD_EXECUTE;
break;
}
case BREAKPOINT:
{
if(print_breakpoint_flag)
{
printf("BREAKPOINT\n");
print_breakpoint_flag=0;
}
//some debugger hardware would be implemented to
//kick the state machine out of this state.
next_state=BREAKPOINT;
break;
}
default:
{
exit(0);
}
}
}
int main ( void )
{
//00000000 <reset-0x8>:
mem[1]=0x2000; mem[0]=0x1000;//0: 20001000 .word 0x20001000
mem[3]=0x0000; mem[2]=0x0009;//4: 00000009 .word 0x00000009
//00000008 <reset>:
mem[ 4]=0x3101; // 8: 3101 adds r1, #1
mem[ 5]=0x3202; // a: 3202 adds r2, #2
mem[ 6]=0x1889; // c: 1889 adds r1, r1, r2
mem[ 7]=0x1849; // e: 1849 adds r1, r1, r1
mem[ 8]=0x3103;//1 10: 3103 adds r1, #3
mem[ 9]=0x3104;//1 12: 3104 adds r1, #4
mem[10]=0x3105;//1 14: 3105 adds r1, #5
mem[11]=0x3106;//1 16: 3106 adds r1, #6
mem[12]=0xbe00;//1 18: be00 bkpt 0x0000
mem[13]=0x3107;//1 1a: 3107 adds r1, #7
mem[14]=0x3108;//1 1c: 3108 adds r1, #8
mem[15]=0x3109;//1 1e: 3109 adds r1, #9
mem[16]=0x310a;//2 20: 310a adds r1, #10
mem[17]=0xe7fe;//2 22: e7fe b.n 22 <reset+0x1a>
reset();
while(1)
{
one_clock();
}
return(0);
}
The processor clock runs forever, no matter what
while(1)
{
one_clock();
}
I did not deal with the flags that an add does, I am not doing any conditional execution, in the few instructions I supported, this is not a complete processor obviously this is just the minimum brute force code to handle a few instructions.
The output of the program looks like this
RESET 0x20001000 0x00000008
FETCH 0x00000008
DECODE 0x3101
ADD_IMM r1 #0x1
ADD_EXECUTE 0x00000000 0x00000001
ADD_WRITEBACK r1 0x00000001
FETCH 0x0000000A
DECODE 0x3202
ADD_IMM r2 #0x2
ADD_EXECUTE 0x00000000 0x00000002
ADD_WRITEBACK r2 0x00000002
FETCH 0x0000000C
DECODE 0x1889
ADD_REG_A r1
ADD_REG_B r2
ADD_EXECUTE 0x00000001 0x00000002
ADD_WRITEBACK r1 0x00000003
FETCH 0x0000000E
DECODE 0x1849
ADD_REG_A r1
ADD_REG_B r1
ADD_EXECUTE 0x00000003 0x00000003
ADD_WRITEBACK r1 0x00000006
FETCH 0x00000010
DECODE 0x3103
ADD_IMM r1 #0x3
ADD_EXECUTE 0x00000006 0x00000003
ADD_WRITEBACK r1 0x00000009
FETCH 0x00000012
DECODE 0x3104
ADD_IMM r1 #0x4
ADD_EXECUTE 0x00000009 0x00000004
ADD_WRITEBACK r1 0x0000000D
FETCH 0x00000014
DECODE 0x3105
ADD_IMM r1 #0x5
ADD_EXECUTE 0x0000000D 0x00000005
ADD_WRITEBACK r1 0x00000012
FETCH 0x00000016
DECODE 0x3106
ADD_IMM r1 #0x6
ADD_EXECUTE 0x00000012 0x00000006
ADD_WRITEBACK r1 0x00000018
FETCH 0x00000018
DECODE 0xBE00
BREAKPOINT
Each line a clock, a state in the state machine, except at the end, instead of printing BREAKPOINT infinitely I only print it once.
And hopefully this demonstrates the question.
case BREAKPOINT:
{
next_state=BREAKPOINT;
break;
}
The processor is being clocked, the clock does not stop, the state machine is stuck in the breakpoint state forever.
In a real processor there would be a way out, some other signals also clocked by the processor but not in this state machine but other state machines (remember unlike a C program, things happen in parallel, to demonstrate that one_clock would have multiple state machines or other individual signals)
a little google and some code from stackoverflow
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/select.h>
#include <termios.h>
unsigned short mem[256];
unsigned int reg[16];
unsigned int pc;
unsigned int pc_next;
unsigned int state;
unsigned int next_state;
unsigned int alu_a;
unsigned int alu_b;
unsigned int alu_out;
unsigned int rd;
unsigned int rn;
unsigned int rm;
unsigned int imm;
unsigned short inst;
unsigned int print_breakpoint_flag;
unsigned int exit_breakpoint;
enum
{
NONE,
FETCH,
DECODE,
ADD_IMM,
ADD_REG_A,
ADD_REG_B,
ADD_EXECUTE,
ADD_WRITEBACK,
BREAKPOINT
};
void reset ( void )
{
//normally this is assumed to be random garbage not zeros;
memset(reg,0,sizeof(reg));
//this would normally be done in the state machine as well with
//some number of clock cycles
reg[13]=mem[1];
reg[13]<<=16;
reg[13]|=mem[0];
pc_next=mem[3];
pc_next<<=16;
pc_next|=mem[2];
if((pc_next&1)==0)
{
printf("thumb only\r\n");
exit(1);
}
pc_next&=(~1);
printf("RESET 0x%08X 0x%08X\r\n",reg[13],pc_next);
next_state=FETCH;
}
void one_clock ( void )
{
state=next_state;
next_state=NONE;
switch(state)
{
case FETCH:
{
printf("FETCH 0x%08X\r\n",pc_next);
pc=pc_next;
pc_next=pc+2;
reg[15]=pc+4;
inst=mem[pc>>1];
next_state=DECODE;
print_breakpoint_flag=1;
break;
}
case DECODE:
{
printf("DECODE 0x%04X\r\n",inst);
if((inst&0xF800)==0x3000)
{
//add immediate immed8
rd=(inst>>8)&0x7;
rn=rd;
imm=(inst>>0)&0xFF;
next_state=ADD_IMM;
}
if((inst&0xFE00)==0x1800)
{
rd=(inst>>0)&7;
rn=(inst>>3)&7;
rm=(inst>>6)&7;
next_state=ADD_REG_A;
break;
}
if((inst&0xFF00)==0xBE00)
{
imm=(inst>>0)&0xFF;
next_state=BREAKPOINT;
}
break;
}
case ADD_IMM:
{
printf("ADD_IMM r%u #0x%X\r\n",rn,imm);
alu_a=reg[rn];
alu_b=imm;
next_state=ADD_EXECUTE;
break;
}
case ADD_EXECUTE:
{
printf("ADD_EXECUTE 0x%08X 0x%08X\r\n",alu_a,alu_b);
alu_out=alu_a+alu_b;
//not doing flags
next_state=ADD_WRITEBACK;
break;
}
case ADD_WRITEBACK:
{
printf("ADD_WRITEBACK r%u 0x%08X\r\n",rd,alu_out);
reg[rd]=alu_out;
next_state=FETCH;
break;
}
case ADD_REG_A:
{
printf("ADD_REG_A r%u\r\n",rn);
alu_a=reg[rn];
next_state=ADD_REG_B;
break;
}
case ADD_REG_B:
{
printf("ADD_REG_B r%u\r\n",rm);
alu_b=reg[rm];
next_state=ADD_EXECUTE;
break;
}
case BREAKPOINT:
{
if(print_breakpoint_flag)
{
printf("BREAKPOINT\r\n");
print_breakpoint_flag=0;
}
//some debugger hardware would be implemented to
//kick the state machine out of this state.
next_state=BREAKPOINT;
if(exit_breakpoint)
{
exit_breakpoint=0;
next_state=FETCH;
}
break;
}
default:
{
exit(0);
}
}
}
struct termios orig_termios;
void reset_terminal_mode()
{
tcsetattr(0, TCSANOW, &orig_termios);
}
void set_conio_terminal_mode()
{
struct termios new_termios;
/* take two copies - one for now, one for later */
tcgetattr(0, &orig_termios);
memcpy(&new_termios, &orig_termios, sizeof(new_termios));
/* register cleanup handler, and set the new terminal mode */
atexit(reset_terminal_mode);
cfmakeraw(&new_termios);
tcsetattr(0, TCSANOW, &new_termios);
}
int kbhit()
{
struct timeval tv = { 0L, 0L };
fd_set fds;
FD_ZERO(&fds);
FD_SET(0, &fds);
return select(1, &fds, NULL, NULL, &tv) > 0;
}
int getch()
{
int r;
unsigned char c;
if ((r = read(0, &c, sizeof(c))) < 0) {
return r;
} else {
return c;
}
}
int main ( void )
{
//00000000 <reset-0x8>:
mem[1]=0x2000; mem[0]=0x1000;//0: 20001000 .word 0x20001000
mem[3]=0x0000; mem[2]=0x0009;//4: 00000009 .word 0x00000009
//00000008 <reset>:
mem[ 4]=0x3101; // 8: 3101 adds r1, #1
mem[ 5]=0x3202; // a: 3202 adds r2, #2
mem[ 6]=0x1889; // c: 1889 adds r1, r1, r2
mem[ 7]=0x1849; // e: 1849 adds r1, r1, r1
mem[ 8]=0x3103;//1 10: 3103 adds r1, #3
mem[ 9]=0x3104;//1 12: 3104 adds r1, #4
mem[10]=0x3105;//1 14: 3105 adds r1, #5
mem[11]=0x3106;//1 16: 3106 adds r1, #6
mem[12]=0xbe00;//1 18: be00 bkpt 0x0000
mem[13]=0x3107;//1 1a: 3107 adds r1, #7
mem[14]=0x3108;//1 1c: 3108 adds r1, #8
mem[15]=0x3109;//1 1e: 3109 adds r1, #9
mem[16]=0x310a;//2 20: 310a adds r1, #10
mem[17]=0xe7fe;//2 22: e7fe b.n 22 <reset+0x1a>
set_conio_terminal_mode();
exit_breakpoint=0;
reset();
while(1)
{
if(kbhit())
{
getch();
exit_breakpoint=1;
}
one_clock();
}
return(0);
}
At least on my linux system, I can now run it, it "halts" at the breakpoint, until I press a key on the keyboard, and then it continues.
case BREAKPOINT:
{
if(print_breakpoint_flag)
{
printf("BREAKPOINT\r\n");
print_breakpoint_flag=0;
}
//some debugger hardware would be implemented to
//kick the state machine out of this state.
next_state=BREAKPOINT;
if(exit_breakpoint)
{
exit_breakpoint=0;
next_state=FETCH;
}
break;
}
BREAKPOINT
FETCH 0x0000001A
DECODE 0x3107
ADD_IMM r1 #0x7
ADD_EXECUTE 0x00000018 0x00000007
ADD_WRITEBACK r1 0x0000001F
FETCH 0x0000001C
DECODE 0x3108
ADD_IMM r1 #0x8
ADD_EXECUTE 0x0000001F 0x00000008
ADD_WRITEBACK r1 0x00000027
FETCH 0x0000001E
DECODE 0x3109
ADD_IMM r1 #0x9
ADD_EXECUTE 0x00000027 0x00000009
ADD_WRITEBACK r1 0x00000030
FETCH 0x00000020
DECODE 0x310A
ADD_IMM r1 #0xA
ADD_EXECUTE 0x00000030 0x0000000A
ADD_WRITEBACK r1 0x0000003A
FETCH 0x00000022
DECODE 0xE7FE
And we see the rest of the execution up to the branch to self, which I did not implement, so it hits the NONE state and exits the program.
In the case of the arm cortex-m family
Breakpoint causes a HardFault exception or a debug halt to occur depending on the presence and configuration of the debug support.
I have chosen "debug halt" here. If implemented as a HardFault instead then the processor would not stop execution it would then read the HardFault exception handler address, and then fetch instructions there, as well as all the stack stuff that the processor does to save state before handling the exception. All of this is to some extent documented in the arm documentation.
The cortex-m and arms in general have WFI as an example which is wait for interrupt
Wait For Interrupt is a hint instruction that suspends execution until one of a number of events occurs.
And then a page of the documentation goes through the possible ways of getting out of a WFI (if actually implemented, some cores a WFI is just a nop and it does not wait).
A halt if a processor has it would have fewer ways out than a WFI but would be similar to a breakpoint as far as how to get out of it (a reset or some debugger interaction to change the state of the state machine).
Not all processors have a halt. What I have shown so far is a breakpoint, and while in a ram based system where your program is in ram, as your external documentation states or implies, an instruction would be replaced by a breakpoint.
add r1,#4
add r1,#5
add r1,#6
add r3,r3,r1
add r1,#7
add r1,#8
add r1,#9
add r1,#10
you might go into some gui debug tool and select the add r3,r3,r1 and click some breakpoint thing. That may literally cause the gui software to write a 0xbe00 instruction where that add was, and the software would remember that the add was there. When you execute, the breakpoint happens, and some debug logic tells the gui (more wires and signals in the processor that the execution of the breakpoint can be detected by the debugger). When you press some continue button. The gui will/may replace the breakpoint instruction in that memory location with the real add instruction, and then change the processor state to execute that address again. That would be the kind of debugger that clears the breakpoint once you stopped on it. Some may keep that breakpoint and in that case would likely replace the breakpoint with the real instruction, single step, then replace the instruction with the breakpoint then continue execution.
Single stepping with a debugger is just more signals into the processor execution state machine from some some other debugger state machine, to put the processor into a halted state. state=next_state; if debugger_state=step, then state=WAIT_FOR_STEP. and that wait for step state would wait for some debugger state to change, or some other signal. (think of signals and registers in logic as variables in C).
The other example of halting on an address, would be in our FETCH state for example
case FETCH:
{
if(pc_next==hardware_monitor_address)
{
next_state=HALT;
break;
}
printf("FETCH 0x%08X\r\n",pc_next);
pc=pc_next;
pc_next=pc+2;
reg[15]=pc+4;
and that would put us in a halt state similar to the breakpoint and we would need signals from the debugger to kick us out of that state.