Search code examples
ccodeql

CodeQL dataflow query on a C program not finding a simple path from an assignment expression to a function's argument


I am new to CodeQL and have started learning about dataflow queries for C/C++ programs. Following is a excerpt of a C program that I want to analyse:

int main(int argc, char * argv[])
{
    unsigned short size, x, y;
    int r1, r2;
    x = atoi(argv[1]);// one dim of the data
    y = atoi(argv[2]);//other dim of the data
    size = x*y; //total size of the data
    r1 = MyVuln(size*sizeof(char));

    r2 = MyVuln(x*sizeof(char));
...
// some code
...

return 0
}

In the above example, I want to capture if MyVuln function is called with size as argument. The size is defined as a result of AssignExpr such that its Rvalue is a result of multiplication. Following is the COdeQL queries that I wrote:

/*
@kind path-problem
*/

import cpp
import semmle.code.cpp.dataflow.new.DataFlow
//import DataFlow::PathGraph

from Function myvuln, FunctionCall fc, AssignExpr ab 
where
myvuln.hasGlobalName("MyVuln")
and fc.getTarget() = myvuln
and ab.getLValue().getType().getUnspecifiedType() instanceof IntegralType
and ab.getRValue() instanceof MulExpr
and exists (DataFlow::Node src, DataFlow::Node sink| 
src.asExpr() = ab.getLValue()
and sink.asExpr() = fc.getArgument(0)
and DataFlow::localFlow(src, sink)
)
select fc, "MyVuln with Arithmetic arg at " + fc.getLocation().toString()

The query returns no result (I am using CodeQl with VS Code). I also checked if a smaller partial query can detect expression corresponding to size definition and it is working. I also checked if the query finds calls to MyVuln and it is working. Only when I start writing dataflow path query, I am getting no result. This type of query seems pretty straight forward, but I am not getting any clue where I have gone wrong or what is that I am missing in this query. A help is highly appreciates. thanks


Solution

  • So, based on the suggestions from @Marcono1234, following is the query that worked for my problem mentioned in the question above.

    /*
    @kind path-problem
    */
    
    import cpp
    import semmle.code.cpp.dataflow.new.DataFlow
    import semmle.code.cpp.dataflow.new.TaintTracking
    //import DataFlow::PathGraph
    
    from Function myvuln, FunctionCall fc, AssignExpr ab, Expr p, DataFlow::Node src, DataFlow::Node sink
    where
    // getting the call that I am interested in as sink
    myvuln.hasGlobalName("MyVuln")
    and fc.getTarget() = myvuln
    // getting the "interesting" parameter that will flow into the parameter of MyVuln
    and ab.getLValue().getType().getUnspecifiedType() instanceof IntegralType
    and ab.getRValue() instanceof MulExpr
    and src.asExpr() = ab.getRValue() // this was problematic as in my earlier query, I was extracting LValue. But it turns out that I need to select the expression that will compute the value that will flow into the parameter of MyVuln. thus the RValue expression
    and sink.asExpr() = fc.getArgument(0)
    and TaintTracking::localTaint(src, sink)
    
    select fc, sink.toString(), "MyVul with Arithmetic operation at " + fc.getLocation().toString()
    
    

    As I am learning CodeQL, I also wanted to understand various ways of doing the stuff. So, I explored the same problem as dataflow by extracting the size var from the expression size*sizeof(char) by using `getAChild*(). this will require changes in the above queries at two place as follows:

    and sink.asExpr() = fc.getArgument(0).getAChild*()
    //and TaintTracking::localTaint(src, sink)
    and DataFlow::localFlow(src, sink)