Search code examples
cpython-3.xcharacter-encodingstackbuffer-overflow

Problems with encoding during buffer overflow exploit


I am on Ubuntu Linux 16.04/Intel with ASLR turned off.

The below programme is exploited.

#include <stdio.h>
#include <string.h>

void func(char *name)
{
    char buf[100];
    strcpy(buf, name);
    printf("Welcome %s\n", buf);
}

int main(int argc, char *argv[])
{
   func(argv[1]);
   return 0;
}

It is built with.

$ gcc buf.c -o buf -fno-stack-protector -mpreferred-stack-boundary=2

I can successfully overflow the buffer and overwrite the return address when using 7-bit characters (?) like below.

gdb-peda$ run $(python3 -c 'print("\x41" * 108)')

enter image description here

However, it doesn't work correctly when I try to insert an 8-bit character (?).

gdb-peda$ run $(python3 -c 'print("\xc0" * 108)')

enter image description here

There seems to be some kind of UTF-8 encoding on the way, so \xc0 becomes \xc3\x80.

I tried running

gdb-peda$ run $(python3 -c 'print(("\xc0".encode("latin1") * 108))')

This does something messed up.. In any case, the return address is not overwritten successfully.

enter image description here

Stuck and any pointers would be much appreciated.


Solution

  • That's because Python by default encodes strings by sys.stdout.encoding before writing the bytes. You can directly write bytes to sys.stdout.buffer to avoid encoding:

    run $(python3 -c '__import__("sys").stdout.buffer.write(b"\xc0" * 108)')
    

    What's happening when you do print("\xc0".encode("latin1") * 108) (Equivalenty just print(b"\xc0" * 108)) is that it's printing b'\xc0\xc0...\xc0\xc0' (The literal bytes b, ', \, x, ... or 0x62, 0x27, 0x5C, 0x78, ...)