Search code examples
programming-languagesimplementationbuffer-overflow

Why is bounds checking not implemented in some of the languages?


According to the Wikipedia (http://en.wikipedia.org/wiki/Buffer_overflow)

Programming languages commonly associated with buffer overflows include C and C++, which provide no built-in protection against accessing or overwriting data in any part of memory and do not automatically check that data written to an array (the built-in buffer type) is within the boundaries of that array. Bounds checking can prevent buffer overflows.

So, why are 'Bounds Checking' not implemented in some of the languages like C and C++?


Solution

  • Basically, it's because it means every time you change an index, you have to do an if statement.

    Let's consider a simple C for loop:

    int ary[X] = {...};  // Purposefully leaving size and initializer unknown
    
    for(int ix=0; ix< 23; ix++){
        printf("ary[%d]=%d\n", ix, ary[ix]);
    }
    

    if we have bounds checking, the generated code for ary[ix] has to be something like

    LOOP:
        INC IX          ; add `1 to ix
        CMP IX, 23      ; while test
        CMP IX, X       ; compare IX and X
        JGE ERROR       ; if IX >= X jump to ERROR
        LD  R1, IX      ; put the value of IX into register 1
        LD  R2, ARY+IX  ; put the array value in R2
        LA  R3, Str42   ; STR42 is the format string
        JSR PRINTF      ; now we call the printf routine
        J   LOOP        ; go back to the top of the loop
    
    ;;; somewhere else in the code
    ERROR:
        HCF             ; halt and catch fire
    

    If we don't have that bounds check, then we can write instead:

        LD R1, IX
    LOOP:
        CMP IX, 23
        JGE END
        LD R2, ARY+R1
        JSR PRINTF
        INC R1
        J   LOOP
    

    This saves 3-4 instructions in the loop, which (especially in the old days) meant a lot.

    In fact, in the PDP-11 machines, it was even better, because there was something called "auto-increment addressing". On a PDP, all of the register stuff etc turned into something like

    CZ  -(IX), END    ; compare IX to zero, then decrement; jump to END if zero
    

    (And anyone who happens to remember the PDP better than I do, don't give me trouble about the precise syntax etc; you're an old fart like me, you know how these things slip away.)