How can I intercept a function from glibc and print the values of its parameters?

In glibc, the function _IO_new_fopen() is called by the fopen() libcall. If I am running the following code, is there any approach that allows me to intercept the _IO_new_fopen() function whenever it gets called and printout the values of its parameters?

For kernel functions, this can be achieved by Jprobe, and I am actually looking for a similar mechanism for the functions in glibc. LD_PRELOAD is a related mechnism in glibc that allows us to replace a glibc function with our self-defined function, but it does not help to achieve my goal.

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
int main(void){

    char buff[100];
    int i, r1;
    FILE *f1 = fopen("text.txt", "wb");

    for(i = 0; i < 100; i++){
            buff[i] = 'b';
    }
    assert(f1);
    r1 = fwrite(buff, 1, 100, f1);
    printf("wrote %d elements\n", r1);
    fclose(f1);
}

Solution

In glibc, fopen is a [public] symbol alias to [local symbol] _IO_new_fopen (i.e. they are identical), so, technically speaking, fopen doesn't call _IO_new_fopen--it is it.

If you are interested in intercepting the call as shown in your example (i.e. from the app), then using LD_PRELOAD [along with dlopen, dlsym, etc.] and defining your own fopen will work. I've done this before in my own code.

You may just be missing an entry point (i.e. you may need to define/intercept multiple symbols): fopen, fopen64, _IO_file_fopen, _IO_fopen, etc. If you do readelf -s on your executable, the simple fopen may show up as something else. I'm guessing that you'll see fopen64. Also, you may need to account for symbol versioning.

You won't be able to intercept internal calls within glibc to fopen because they bypass the mechanism and go direct. But, this isn't quite so useful, so more details on your part might be required.

You can also look at fopencookie as a way to intercept the underlying read(2), write(2), etc. syscalls.

UPDATE:

Specifically, I am trying to print the address and the content of the buffer that is used by fread()/fwrite().

Easy to do. Details below ...

I think it should be able to be done by adding "printk()" somewhere and rebuild libc.so.6, but is there any approach that is more convenient (i.e. without rebuilding libc.so.6)?

No need to rebuild libc, the LD_PRELOAD will handle it. Remember that printk is in the kernel. No need to go there [and it probably wouldn't work]

The address you want is what's passed to fread and not the buffer that's passed to read(2). The buffer passed to read(2) is within the FILE struct, so, it wouldn't tell you much. Otherwise, you could just run the whole program under strace(1) [or write your own custom version that uses ptrace(2)].

You can intercept, trace, breakpoint on, etc. an arbitrary number of functions. You just need to create your own shared library (.so) and set LD_PRELOAD to it.

Here is a sample interceptor function [for fread]:

// trapme -- put gdb breakpoint on this function
__attribute__((__noinline__)) void
trapme(void)
{

    // prevent the function call to this from being optimized away
    __asm__ __volatile__ () ::: memory;
}

// dumpme -- dump out a buffer
void
dumpme(const void *buf,size_t xlen)
{

    // dump data in whatever format you'd like ...
}

// fread -- intercept fread calls
size_t
fread(void *ptr, size_t size, size_t nmemb, FILE *stream)
{
    static size_t
    (*fread_real)(void *ptr,size_t size,size_t nmemb,FILE *stream) = NULL;
    size_t xlen;

    printf("fread_fake: ENTER ptr=%p size=%lX nmemb=%ld stream=%p\n",
        ptr,size,nmemb,stream);

    // do trap on some suspicious activity ...
#if 0
    if (ptr == ...)
        trapme();
#endif

    // locate the real symbol in glibc
    if (fread_real == NULL)
        fread_real = dlsym(RTLD_NEXT,"fread");
    // abort if fread_real is still null ...

    xlen = fread_real(ptr,size,nmemb,stream);

    // dump out the data
    if (xlen > 0)
        dumpme(ptr,xlen * size);

    // do trap on some suspicious activity ...
#if 0
    if (xlen == 372)
        trapme();
#endif

    printf("fread_fake: EXIT xlen=%ld\n",xlen);

    return xlen;
}

That's the basic mechanism. You can add whatever [nefarious] things you desire. You can get fancy and add some logic that traps if a buffer is in some [bad] range or funny buffer contents. That is, similar to a cond statement for a gdb breakpoint. So, you can use this to trigger and drop into gdb using much more complex tests than you could with gdb alone [to find really hard to find bugs].

You can also monitor the fopen and remember the filename, etc.

Getting this mechanism working isn't too difficult, but my suggestion would be to write the interceptor for fopen and get familiar with the dlsym trick first (vs. trying to debug it within a loop).

You can create as many of these functions as you need. Start simply (e.g. fopen, fread, fwrite). Then, add more as you find a "gap" in the coverage (e.g. Eventually, you'll probably find that intercepting fseek gives you needed information)

UPDATE #2:

Here is a sample strace script with the options that I like to use:

strace -ttt -i -f -etrace=all -o \
    /home/me/log/foobar.spysys \
    -eread=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 \
    -ewrite=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 \
    -x foobar

The -eread and -ewrite can be extended to as many units as you need