Search code examples
linux-kernelx86-64system-callsinstrumentationintel-pin

Why is Intel Pin not able to instrument open syscall?


I am trying to build a pintool that should be able to instrument an open() syscall that targets a specific file/directory and replace the file path argument with another value.

For example, here is a very simple code that I want to instrument:

    #include <iostream>
    #include <sys/types.h>
    #include <sys/stat.h>
    #include <fcntl.h>
    
    using namespace std;
    
    int main(int argc, char **argv)
    {
        int i = open("/home/preet_derasari/important.txt", O_RDONLY);
        cout << "fid: " << i << endl;
    }

In this example I want Pin to change the file path from /home/preet_derasari/important.txt to /home/preet_derasari/dummy.txt. In order to do this I wrote a very simple pintool after referring to some example pintools and Pin APIs:

    #include "pin.H"
    #include <iostream>
    #include <fstream>
    #include <syscall.h>
    #include <string>
    using namespace std;
    
    INT32 Usage()
    {
        cout << "This tool prints out the number of dynamically executed " << endl
             << "instructions, basic blocks and threads in the application." << endl
             << endl;
    
        cout << KNOB_BASE::StringKnobSummary() << endl;
    
        return -1;
    }
    
    void SyscallEntry(THREADID threadIndex, CONTEXT *ctxt, SYSCALL_STANDARD std, void *v)
    {
        ADDRINT sysNum = PIN_GetSyscallNumber(ctxt, std);
        cout << "entered syscall: " << sysNum << endl;
        if(sysNum == SYS_open)
        {
            cout << "open encountered!" << endl;
            char *path = (char *)PIN_GetSyscallArgument(ctxt, std, 0);
            cout << "Original File Path: " << path << endl;
            int match = strcmp((char *)PIN_GetSyscallArgument(ctxt, std, 0), "/home/preet_derasari/important.txt");
            if(!match)
            {
                string pathDummy = "/home/preet_derasari/dummy.txt";
                PIN_SetSyscallArgument (ctxt, std, 0, (ADDRINT) pathDummy.c_str());
                cout << "Dummy File Path: " << pathDummy << endl;
            }
        }
    }
    
    int main(int argc, char* argv[])
    {
        cout << "Open Syscall Value: " << SYS_open << endl;
    
        if (PIN_Init(argc, argv))
        {
            return Usage();
        }
    
        cout << "===============================================" << endl;
        cout << "This application is instrumented by MyPinTool" << endl;
        cout << "===============================================" << endl;
    
        PIN_AddSyscallEntryFunction(SyscallEntry, 0);
    
        // Start the program, never returns
        PIN_StartProgram();
    
        return 0;
    }

I run the pintool with this command: ../../../pin -t obj-intel64/MY_pin.so -- test where MY_pin.so is the pintool shared object library and test is the sample code mentioned above.

The output just baffles me because Pin is instrumenting all syscalls except open:

    Open Syscall Value: 2
    ===============================================
    This application is instrumented by MyPinTool
    ===============================================
    entered syscall: 12
    entered syscall: 158
    entered syscall: 21
    entered syscall: 257
    entered syscall: 5
    entered syscall: 9
    entered syscall: 3
    entered syscall: 257
    entered syscall: 0
    entered syscall: 17
    entered syscall: 17
    entered syscall: 17
    entered syscall: 5
    entered syscall: 9
    entered syscall: 17
    entered syscall: 17
    entered syscall: 17
    entered syscall: 9
    entered syscall: 9
    entered syscall: 9
    entered syscall: 9
    entered syscall: 9
    entered syscall: 3
    entered syscall: 158
    entered syscall: 10
    entered syscall: 10
    entered syscall: 10
    entered syscall: 11
    entered syscall: 12
    entered syscall: 12
    entered syscall: 257
    entered syscall: 5
    entered syscall: 9
    entered syscall: 3
    entered syscall: 3

As you can see pin instruments all syscalls except open i.e., syscall number 2 (based on x86_64 ISA).

An interesting observation is that the program doesn't output the cout from my test program (cout << "fid: " << i << endl;) which makes me question if Pin is doing something weird with the open syscall?

Specifications:

  • Pin version - pin-3.21-98484-e7cd811fd
  • gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
  • ISA: x86_64
  • CPU: AMD Ryzen 7 1700X Eight-Core Processor

Can someone please help me understand why this is happening?


Solution

  • strace cat foo shows you that programs don't use the old open(2) system call anymore:

    ...
    openat(AT_FDCWD, "foo", O_RDONLY)       = 3
    ...
    

    __NR_openat is 257, which your PIN tool observed 3 times. Apparently even the open() libc wrapper function internally uses the openat Linux system call. (The __NR_open = 2 system call does still work; the kernel also has code to pass its args to the current implementation. IDK which is more efficient, like maybe it just sets up an AT_FDCWD arg and calls sys_openat() which has to decode it again, just like glibc does in user-space.)


    The open(2) man page also documents openat(2).

    The dirfd argument is used in conjunction with the pathname argument as follows:

    • If the pathname given in pathname is absolute, then dirfd is ignored.

    • If the pathname given in pathname is relative and dirfd is the special value AT_FDCWD, then pathname is interpreted relative to the current working directory of the calling process (like open()).

    • ...

    openat / linkat and so on, when used with an fd from open(O_DIRECTORY), let programs like find avoid TOCTOU races, and/or let multi-threaded programs avoid having to actually chdir (because there's only one CWD per process, not per thread.)

    Using them with AT_FDCWD has no advantage or disadvantage vs. old-style open(2).