Search code examples
dependenciescall-graphsoot

Points-to analysis - A definition


I'm looking to perform some dependence analysis using a call-graph that I will build using the Soot framework. I read in a guide that using 'points-to' analysis can improve the precision of a call graph. What exactly is 'points-to' analysis and how could it improve the accuracy of a call-graph?


Solution

  • A key problem in understanding data flow is know what date each pointer can reference. If you know nothing about a pointer to an object, and that object is updated via the pointer (e.g., p.=3) then it is possible that any object in your entire system might be modified. If you know that p references a specific object O1, then you know that only O1 might be modified. So knowledge of what p can point to, is important in understand side effects and the scope of such effects.

    Now, imagine you have pointers to functions. If you don't know what a function pointer p points to, and a function call is made indirectly through p, then any function might get called, and the side effects could be any side effect from any function. If you know that p can only point to foo, then only the side effects that foo might cause can occur.

    When computing a call graph, some function calls clearly only go to one place. Some function calls can go to a variety of places because they are in fact function calls via pointers; "method" calls in OO languages are often like this and this is done on purpose to support polymorphism.

    If you don't do points-to analysis, you can't possibly have done function-pointer points-to analysis. That means your constructed call graph says a node bar might call many possible functions through its pointer p, which means there are many side effects you have to worry about.

    A precise points-to analysis leads to precise function-points-to analysis, which leads to precise side effect analysis, which leads to better understanding of what the code can do. Of course, precision is relative; and it is harder to get "very precise" points to analysis. In the limit, it is impossible to get perfect points-to anlaysis; you are analyzing turing machines.

    You can see some more discussion on flow analysis and an example of a "more precise" call graph at http://www.semdesigns.com/Products/DMS/FlowAnalysis.html