Search code examples
kernelprocessioblock-devicescheduling

How is a process state updated to blocked state (TASK_INTERRUPTIBLE)?


When a process is waiting for I/O, how is the task state updated to TASK_INTERRUPTIBLE (that is, blocked)?

Imagine this case, a process issues an I/O request to a block device. According to my previous thread, the process finally invokes elv_add_request() to add the request to the I/O queue. So I guess in this elv_add_request() call, the implementation will something like:

elv_add_request(){
   // Register IO_CALLBACK()
   set_task_state(task, TASK_INTERRUPTABLE); // blocked
   // flush IO request to disk
   ...
}

IO_CALLBACK(){
    set_task_state(task, TASK_RUNNING); // IO completed, ready to run
}

The logic is like this: When the I/O request is finished, it will use the call back function to notify the kernel that the process is ready now. Does it make sense?

If that's the case, how is the callback mechanism implemented? Is it a CPU/hardware feature?


Solution

  • It's behaving similarly to what you describe, except the io callback is set before calling elv_add_request(). If we take the stack from the previous thread:

     [<c027fac4>] error_code+0x74/0x7c
     [<c019ed65>] elv_next_request+0x6b/0x116
     [<e08335db>] scsi_request_fn+0x5e/0x26d [scsi_mod]
     [<c019ee6a>] elv_insert+0x5a/0x134
     [<c019efc1>] __elv_add_request+0x7d/0x82
     [<c019f0ab>] elv_add_request+0x16/0x1d
     [<e0e8d2ed>] pkt_generic_packet+0x107/0x133 [pktcdvd]
     [<e0e8d772>] pkt_get_disc_info+0x42/0x7b [pktcdvd]
     [<e0e8eae3>] pkt_open+0xbf/0xc56 [pktcdvd]
     [<c0168078>] do_open+0x7e/0x246
     [<c01683df>] blkdev_open+0x28/0x51
     [<c014a057>] __dentry_open+0xb5/0x160
     [<c014a183>] nameidata_to_filp+0x27/0x37
     [<c014a1c6>] do_filp_open+0x33/0x3b
     [<c014a211>] do_sys_open+0x43/0xc7
     [<c014a2cd>] sys_open+0x1c/0x1e
     [<c0102b82>] sysenter_past_esp+0x5f/0x85
    

    The calls stack goes (I'm looking at a 4.1-rc1 source) like this:

    pkt_generic_packet() 
      blk_execute_rq()
        initialize a 'struct completion' object
        set it as 'struct request->end_io_data'
        blk_execute_rq_nowait(..., blk_end_sync_rq) // that's the io callback
        wait_for_completion_io() //sets task to TASK_UNINTERRUPTIBLE, waits on the 'struct completion' object to be complete
    
     ...
     the io then happens, a possible scenario would then be:
       blk_end_request()
         blk_end_bidi_request()
           blk_finish_request()
             req->end_io() // This is blk_end_sync_rq
               blk_end_sync_rq()
                 complete() // sets task to TASK_NORMAL, signals completion