c python-3.x character-encoding stack buffer-overflow

Problems with encoding during buffer overflow exploit

I am on Ubuntu Linux 16.04/Intel with ASLR turned off.

The below programme is exploited.

#include <stdio.h>
#include <string.h>

void func(char *name)
{
    char buf[100];
    strcpy(buf, name);
    printf("Welcome %s\n", buf);
}

int main(int argc, char *argv[])
{
   func(argv[1]);
   return 0;
}

It is built with.

$ gcc buf.c -o buf -fno-stack-protector -mpreferred-stack-boundary=2

I can successfully overflow the buffer and overwrite the return address when using 7-bit characters (?) like below.

gdb-peda$ run $(python3 -c 'print("\x41" * 108)')

However, it doesn't work correctly when I try to insert an 8-bit character (?).

gdb-peda$ run $(python3 -c 'print("\xc0" * 108)')

There seems to be some kind of UTF-8 encoding on the way, so \xc0 becomes \xc3\x80.

I tried running

gdb-peda$ run $(python3 -c 'print(("\xc0".encode("latin1") * 108))')

This does something messed up.. In any case, the return address is not overwritten successfully.

Stuck and any pointers would be much appreciated.

Solution

That's because Python by default encodes strings by sys.stdout.encoding before writing the bytes. You can directly write bytes to sys.stdout.buffer to avoid encoding:

run $(python3 -c '__import__("sys").stdout.buffer.write(b"\xc0" * 108)')

What's happening when you do print("\xc0".encode("latin1") * 108) (Equivalenty just print(b"\xc0" * 108)) is that it's printing b'\xc0\xc0...\xc0\xc0' (The literal bytes b, ', \, x, ... or 0x62, 0x27, 0x5C, 0x78, ...)