Search code examples
clinuxfunctionmemoryobjdump

Why do I get incorrect results "ffff..." when inspecting the bytes that make up a compiled function stored in memory?


I've been delving deeper into Linux and C, and I'm curious how functions are stored in memory. I have the following function:

void test(){
    printf( "test\n" );
}

Simple enough. When I run objdump on the executable that has this function, I get the following:

08048464 <test>:
 8048464:       55                      push   %ebp
 8048465:       89 e5                   mov    %esp,%ebp
 8048467:       83 ec 18                sub    $0x18,%esp
 804846a:       b8 20 86 04 08          mov    $0x8048620,%eax
 804846f:       89 04 24                mov    %eax,(%esp)
 8048472:       e8 11 ff ff ff          call   8048388 <printf@plt>
 8048477:       c9                      leave
 8048478:       c3                      ret

Which all looks right. The interesting part is when I run the following piece of code:

int main( void ) {
    char data[20];
    int i;    
    memset( data, 0, sizeof( data ) );
    memcpy( data, test, 20 * sizeof( char ) );
    for( i = 0; i < 20; ++i ) {
        printf( "%x\n", data[i] );
    }
    return 0;
}

I get the following (which is incorrect):

55
ffffff89
ffffffe5
ffffff83
ffffffec
18
ffffffc7
4
24
10
ffffff86
4
8
ffffffe8
22
ffffffff
ffffffff
ffffffff
ffffffc9
ffffffc3

If I opt to leave out the memset( data, 0, sizeof( data ) ); line, then the right-most byte is correct, but some of them still have the leading 1s.

Does anyone have any explanation for why

  1. using memset to clear my array results in an incorrect (or inaccurate) representation of the function, and

  2. what is this byte stored as in memory? ints? char? I don't quite understand what's going on here. (clarification: what type of pointer would I use to traverse such data in memory?)

My immediate thought is that this is a result of x86 having an instructions that don't end on a byte or half-byte boundary. But that doesn't make a whole lot of sense, and shouldn't cause any problems.


Solution

  • Here is a much simpler case of the code you tried to do:

    int main( void ) {
        unsigned char *data = (unsigned char *)test;
        int i;    
        for( i = 0; i < 20; ++i ) {
            printf( "%02x\n", data[i] );
        }
        return 0;
    }
    

    The changes I made is to remove your superfluous buffer, instead using a pointer to test, use unsigned char instead of char, and change the printf to use %02x, so that it always prints two characters (it wouldn't fix the 'negative' numbers coming out as ffffff89 or so - that's fixed with the unsigned on the data pointer).

    All instructions in x86 end on byte boundaries, and the compiler will often insert extra "padding-instructions" to make sure branch-targets are aligned to 4, 8 or 16-byte boundaries for efficiency.