Search code examples
pythonclinuxkernelftrace

Extract function names and their comments from C code with python (to understand the Linux kernel)


Backrground Information

I've just started to learn stuff about drivers and the linux kernel. I want to understand how a user write() and read() works. So I started using ftrace to hopefully see the path the functions go. But a trace from a single programm like the following is "enormous".

int main() {
    int w;
    char buffer[] = "test string mit 512 byte";
    int fd = open("/dev/sdd",O_DIRECT | O_RDWR | O_SYNC);
    w = write(fd,buffer,sizeof(buffer));
}

I also don't know which functions I could filter, because I don't know the Linux Kernel and I don't want to throw something important away.

So I've started to work through a function_graph trace. Here is a snip.

 [...]
 12)   0.468 us    |            .down_write();
 12)   0.548 us    |            .do_brk();
 12)   0.472 us    |            .up_write();
 12)   0.762 us    |            .kfree();
 12)   0.472 us    |            .fput();
 [...]

I saw these .down_write() and .up_write() and I thought, this is exactly what I search. So I looked it up. down_write() sourcecode:

 /*
 * lock for writing
 */
 void __sched down_write(struct rw_semaphore *sem)
 {
       might_sleep();
       rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_);

       LOCK_CONTENDED(sem, __down_write_trylock, __down_write);
 }

But it turned out, that this is just to lock and release locks. Then I've starte to write a small reference for me, so I don't have to always look up this stuff, because it fells like there are over 9000. Then I had the idea, why not, parse these functions and their comments and write them behind the functions in the trace file? Like this:

 [...]
 12)   0.468 us    |            .down_write(); lock for writing
 12)   0.548 us    |            .do_brk(); 
 12)   0.472 us    |            .up_write(); release a write lock
 12)   0.762 us    |            .kfree();
 12)   0.472 us    |            .fput();
 [...]

The main Problem

So I've started to think about how I can achieve this. I would like to do it with python, because I feel most comfortable with it.

1. Problem
To match the C functions and comments, I have to define and implement a recursive matching grammar :(

2. Problem
Some functions are just wrappers and have no comments. For example do_brk() wraps __do_brk() and the comment is only over the __do_brk()

So I thought, that maybe there are other sources for the comments. Maybe docs? Also it's possible, that this "doc generation" with python has somebody already implemented.

Or is my way to understand a system read() write() very unintelligent? Can you give me tipps how I should dig deeper?

Thank you very much for reading,
Fabian


Solution

  • Parsing comments is quite hard in practice. Parsing kernel code is not specially easy.

    First, you should understand precisely what a system call is in the linux kernel, and how applications use them. The Linux Assembly HowTo has good explanations.

    Then, you should understand the organization of the Linux kernel. I strongly suggest reading some good books on this.

    Exploring the kernel source code with automatic tools is a big amount of work (months, not days). You might consider the coccinelle tool (for so called "semantic patches"). You could also consider customizing the GCC compiler with plugins, or better yet, with MELT extensions

    (MELT is a high-level domain specific language to extend GCC; I am its main designer & implementor).

    If working with GCC, you'll get all the power of GCC internal representations and processing in the middle-end (but at this stage comments are lost).

    What you are trying to do is probably much more ambitious that what you initially thought. See also Alexandre Lissy's work, e.g. model-checking the linux kernel and the papers he will present at Linux Symposium 2012 (july 2012)