Search code examples
global-variablesllvmllvm-irllvm-c++-api

Checking if a function argument is a global variable in LLVM


This question is similar to the question posted a few years back (Tracing global variables use in llvm), but it went unanswered (also I could not implement what comment suggested in that post). I wanted to elaborate on the question and post my solution to see whether I could get some insights.

So let's assume I have a short test code shown like this:

char a[10];
void foo(char *buf, int test) {
  printf("Foo %p\n", buf);
  buf[1] = '1';
}
int main() {
  foo(a, 17);
}

This is translated into an LLVM IR as shown below (abridged for the sake of brevity):

@test_global = dso_local global i32 17, align 4
@a = dso_local global [10 x i8] zeroinitializer, align 1

define dso_local void @foo(i8* %0, i32 %1) #0 {
  %3 = alloca i8*, align 8
  ...
  store i8* %0, i8** %3, align 8

define dso_local i32 @main() #0 {
  %1 = alloca i32, align 4
  ...
  store i32 0, i32* %1, align 4
  call void @foo(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @a, i64 0, i64 0), i32 17)

So the problem itself is straightforward to solve at first. I just need to check for the instruction store i8* %0, i8** %3, align 8 in a function @foo, that %0 points to a global memory? Because since %0 is the first argument in the call function call void @foo(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @a, i64 0, i64 0), i32 17), I figured that is the simplest way to do it.

However, I'm just a bit lost at what I need to do to implement such a check. I have been studying up to solve this, but the best I could do is have an LLVM pass that looks something like this:

void checkGlobal(Function &F, AAResults &AA) {
  SetVector<Value *> Ptrs; // worklist of pointers
  for (auto &arg : F.args()) {
    if (arg.getType()->isPointerTy()){
      Ptrs.insert(&arg);
    }
  }
  for (auto item : Ptrs) {
    for (auto user : item->users()) {
      if (auto storeI = dyn_cast<StoreInst>(user)) {
            errs() << *storeI->getOperand(0) << " " << *storeI->getPointerOperand() << "\n";
            // this will find the store i8* %0, i8** %3, align 8 
            // LLVM IR instruction with respective operands
      }
    }     
  }
}

I am passing alias analysis results AAResults into this function if I need to use an alias analysis for the pointer i8* %0 in the @foo function (please let me know if it is possible).

I feel like I am at least going in the right direction to solving this problem, but stuck at the moment; I have also tried solving this problem from the LLVM Module level by doing something like obtaining global variable results something like this:

auto &list = M.getGlobalList();
for (auto &gv : list) {
  for (auto use : gv.users()) {
    errs() << *use << "\n"; 
  }
}

This above code will return something like: i8* getelementptr inbounds ([10 x i8], [10 x i8]* @a, i64 0, i64 0) for @foo function to see whether I can use this information to assign what function has global variable passed in as an argument and solve the problem manually. However, this approach is unreliable and very weak.

Thank you for any suggestions, and please let me know if anything in my question is unclear.


Solution

  • In a sense, the purpose of a function is to divide the code into a part that does something and others that supply data to operate on. Which implies that the code inside function does not know whether what's being supplied is a global variable, by design.

    It can be so annoying when the code works as designed ;) You have a couple of possible workarounds.

    First, if the functions cannot be called from beyond the module, you may be able to connect each argument of the function to an operand of the CallInst/InvokeInst instructions that call that function, and see if those operands are global variables. This may be a recursive process, and you have to think about what happens if some callers supply global variables and others don't, or if the information isn't complete because you cannot necessarily decide whether a particular call instruction calls your function or not.

    The other is to look at it at runtime. Generally, each platform locates global variables in a particular part of the address space, and you can find out what that is. If you have a pointer, you can cast it to int and compare it against the start and end of the address space where the linker places global variables.