Search code examples
fuzzing

How does a fuzzer deal with invalid inputs?


Suppose that I have a program that takes a pointer as its input. Without prior knowledge about the structure of the pointee, how does a fuzzer create valid inputs that can actually hits the internal of the program? To make this more concrete, imagine an artificial C program

int myprogram (unknow_pointer* input){
  printf("%s", input->name);
}

In some situations, the tested program first checks the input format. If the input format is not good, it raises an exception. In such situations, how can a fuzzer reach program points beyond that exception-raising statement?


Solution

  • Most fuzzers don't know anything about the internal structure of the program. Different fuzzers dealt with this in a various ways:

    1. Not deal with it at all. Just throw random inputs and hope to produce an input that will pass some/all checks. (for example - radamasa)
    2. Mutate a valid input - take a known valid input, and mutate it (flip bits, remove parts, add parts, etc.) in many cases it will be valid enough to pass some or all of the checks. For example - if you want to fuzz VLC, you will take a valid movie file as the input for the fuzzer, which will provide mutations of it to VLC. Those are often called mutation based fuzzers. (for example - zzuf)
    3. If you have prior knowledge of the input's structure, build a model of the input, and then mutate specific fields within it. A big advantage of such method is the ability to deal with very specific types of fields - checksums, hashes, sizes, etc. Those are often called generation based fuzzers. (for example - spike, sulley and their successors, peach)

    However, in recent years a new kind of fuzzers was evolved - feedback based fuzzers - these fuzzers perform mutations on a valid (or not) input, and based on feedback they receive from the fuzzed program they decide how and what to mutate next. The feedback is received by instrumenting the program execution, either by injection tracing in compile time, injecting the tracing code by patching the program in runtime, or using hardware tracing mechanisms. First among them is AFL (you can read more about it here).