Search code examples
logicformal-verificationmodel-checkingsymbolic-execution

What a Symbolic Model Looks Like


I am trying to understand how Symbolic Execution engines work. This paper surveys the techniques using C. They mention about symbolic memory:

3.1 Fully Symbolic Memory

At the highest level of generality, an engine may treat memory addresses as fully symbolic...

Representing memory in a compact form. This approach was taken in [32], which maps symbolic – rather than concrete – address expressions to data, representing the possible alternative states resulting from referencing memory using symbolic addresses in a compact, implicit form

This paper also goes a little into symbolic heaps for Java:

4.1 Symbolic State Representation

A symbolic state s consists of a symbolic heap H, the valuation of the primitive typed fields, the path condition PC and the program counter.

I am wondering what in practice this means, this symbolic heap. To me this means that all of the data structures used in the symbolic execution will be using symbolic memory rather than the actual memory. Which means structures like arrays need to have symbolic models. I am wondering at a high level what these models look like. I don't see how you can "symbolically model the array length". In my head it is an integer so it will be an integer value. But as a symbolic value, wondering what this means. I don't see it yet. Wondering if one could explain at a high level how you would symbolically model some data structures like arrays or a struct with some different field values, so it would be useful in symbolic execution.

This older paper mentions:

One can also define an alternative "symbolic execution" semantics for a programming language where the real data objects need not be used but can be represented by arbitrary symbols.

Not sure how this looks.


Solution

  • A fully documented open source example of a symbolic execution engine is JPF - Symbolic Pathfinder. It executes on the Java virtual machine level. It should answer your questions also on complexer data structures like arrays or arrays of objects.

    A very good publication is here: “Symbolic PathFinder: Integrating Symbolic Execution with Model Checking for Java Bytecode Analysis” in Automated Software Engineering Journal 20(3) 2013 https://ti.arc.nasa.gov/publications/5269/download/

    Here you can see a detailed example (section 4.9) on how the concrete code is "rewritten" and transformed into symbolic code. Especially how the stack and heap memory and the code-conditions depending on this symbolic memory is handled (Fig. 6+7). You can't simply separate the memory from the conditional path execution.

    Very simplified: The conditions are replaced by a loop over all "symbolic" branches (nondeterministic choice). The memory is replaced by symbolic values (like you mentioned it - propagated here via so called attributes) and constraints on these values depending on the branch-loop. A constraint solver is used to further reduce impossible branches (backtracking) and to solve the constraints finally.

    Another very good hands-on example for .Net-code is to run Microsoft SpecExplorer (XRTs) in the full symbolic mode using "Combination.KeepUnexpanded". In the resulting explored traversed path graph you can see a nice example how the symbolic memory and constraints are represented. Unfortunately XRTs is not open source.

    P.S.: Indeed The representation of the symbolic variables is one of the big challenges. Here are some very good publications: