Search code examples
assemblymachine-codemicroprocessors8-bit

How does a computer distinguish between Data and Instructions?


I watched a video on an 8-bit pc being fed a program - manually, using physics switches.

The fed program was:

MAIN:
    0000 0001 0100     # 0 = LDA [4]
    0001 0010 0101     # 1 = ADD [5]
    0010 0101 0000     # 2 = OUT
    0011 1111 0000     # 3 = HLT

DATA:
    0100 00001110      # 4 = #14
    0101 00011100      # 5 = #28

What I want to know is how the computer, if it does, distinguishes between Data and Instructions, because there are no flags that divide data from instructions.

0001 0001 0010 may be interpreted as either:

1 = LDA [2]

or:

1 = #10

Is it because while the program runs, addresses are treated as instructions. but because of the HLT, the program stops executing the memory addresses as if they were instructions, and leaves higher addresses; and then LDA / ADD / SUB etc. treat all locations in memory as binary values.

In that case, would:

0000 0010 0000 be interpreted as:

0 = ADD #32

and not

0 = ADD [ ADD [ ADD [ ADD ...]]]

** While writing this question I realised new things as I was going along

better example:

If the halt wasn't there, would the program work fine, but then keep on going down to the data and interpret as:

0010 0000 1110      # 4 = NOP [14]
0101 0001 1100      # 5 = LDA [12]

If so, would the computer crash 1: because NOP is given an operand, and 2: because memory addresses 12 and 14 are undefined.


Solution

  • You are on the verge of an important realization: data has no meaning without metadata - in order to make sense of a given sequence of bits, there has to be some "knowledge" about how those bits are supposed to be interpreted.

    As far as instructions are concerned, the CPU's instruction set defines the size of each instruction and its accompanying data. Each instruction begins with the opcode, and the following data is typically fixed size (and the size depends on the opcode). Each instruction is executed in order (until a jump instruction is encountered), starting from some initial address that is hardwired into the CPU.

    So if the initial address happens to be the address of the MAIN label, the first opcode the CPU sees will be 0000 0001, and so it will know that it is an LDA instruction, which it knows is supposed to be followed by a four bit number. Whatever follows those four bits is the next instruction.

    What happens if a bad jump instruction is executed later, sending the CPU to the third bit group in your example? (Based on your example, I'm guessing that the CPU operates with four-bit "bytes".) Then indeed, the CPU will mistake 0100 0001 for an opcode and some number of the following bits as the data for that opcode, and things will likely go very wrong.