Search code examples
brainfuckesoteric-languages

How does the Brainfuck Hello World actually work?


Someone sent this to me and claimed it is a hello world in Brainfuck (and I hope so...)

++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.

I know the basics that it works by moving a pointer and increment and decrementing stuff...

Yet I still want to know, how does it actually work? How does it print anything on the screen in the first place? How does it encode the text? I do not understand at all...


Solution

  • 1. Basics

    To understand Brainfuck you must imagine infinite array of cells initialized by 0 each.

    ...[0][0][0][0][0]...
    

    When brainfuck program starts, it points to any cell.

    ...[0][0][*0*][0][0]...
    

    If you move pointer right > you are moving pointer from cell X to cell X+1

    ...[0][0][0][*0*][0]...
    

    If you increase cell value + you get:

    ...[0][0][0][*1*][0]...
    

    If you increase cell value again + you get:

    ...[0][0][0][*2*][0]...
    

    If you decrease cell value - you get:

    ...[0][0][0][*1*][0]...
    

    If you move pointer left < you are moving pointer from cell X to cell X-1

    ...[0][0][*0*][1][0]...
    

    2. Input

    To read character you use comma ,. What it does is: Read character from standard input and write its decimal ASCII code to the actual cell.

    Take a look at ASCII table. For example, decimal code of ! is 33, while a is 97.

    Well, lets imagine your BF program memory looks like:

    ...[0][0][*0*][0][0]...
    

    Assuming standard input stands for a, if you use comma , operator, what BF does is read a decimal ASCII code 97 to memory:

    ...[0][0][*97*][0][0]...
    

    You generally want to think that way, however the truth is a bit more complex. The truth is BF does not read a character but a byte (whatever that byte is). Let me show you example:

    In linux

    $ printf ł
    

    prints:

    ł
    

    which is specific polish character. This character is not encoded by ASCII encoding. In this case it's UTF-8 encoding, so it used to take more than one byte in computer memory. We can prove it by making a hexadecimal dump:

    $ printf ł | hd
    

    which shows:

    00000000  c5 82                                             |..|
    

    Zeroes are offset. 82 is first and c5 is second byte representing ł (in order we will read them). |..| is graphical representation which is not possible in this case.

    Well, if you pass ł as input to your BF program that reads single byte, program memory will look like:

    ...[0][0][*197*][0][0]...
    

    Why 197 ? Well 197 decimal is c5 hexadecimal. Seems familiar ? Of course. It's first byte of ł !

    3. Output

    To print character you use dot . What it does is: Assuming we treat actual cell value like decimal ASCII code, print corresponding character to standard output.

    Well, lets imagine your BF program memory looks like:

    ...[0][0][*97*][0][0]...
    

    If you use dot (.) operator now, what BF does is print:

    a

    Because a decimal code in ASCII is 97.

    So for example BF program like this (97 pluses 2 dots):

    +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++..

    Will increase value of the cell it points to up to 97 and print it out 2 times.

    aa

    4. Loops

    In BF loop consists of loop begin [ and loop end ]. You can think it's like while in C/C++ where the condition is actual cell value.

    Take a look BF program below:

    ++[]
    

    ++ increments actual cell value twice:

    ...[0][0][*2*][0][0]...
    

    And [] is like while(2) {}, so it's infinite loop.

    Let's say we don't want this loop to be infinite. We can do for example:

    ++[-]
    

    So each time a loop loops it decrements actual cell value. Once actual cell value is 0 loop ends:

    ...[0][0][*2*][0][0]...        loop starts
    ...[0][0][*1*][0][0]...        after first iteration
    ...[0][0][*0*][0][0]...        after second iteration (loop ends)
    

    Let's consider yet another example of finite loop:

    ++[>]
    

    This example shows, we haven't to finish loop at cell that loop started on:

    ...[0][0][*2*][0][0]...        loop starts
    ...[0][0][2][*0*][0]...        after first iteration (loop ends)
    

    However it is good practice to end where we started. Why ? Because if loop ends another cell it started, we can't assume where the cell pointer will be. To be honest, this practice makes brainfuck less brainfuck.