Search code examples
linuxdata-segmentexceptionhandler

why linux set the data-segment to __USER_DS at the prologue of exception handler


I'm trying to read Linux source code(2.6.11)

In the exception handler, at entry.s, error_code:

movl $(__USER_DS), %ecx
movl %ecx, %ds
movl %ecx, %es

I don't know why loading user data segment here. Since it is supposed to be entering the exception handler code which runs in the kernel mode, the selector is supposed to be __KERNEL_DS.

I checked other versions of the code, they do the same thing specifically too at this place.


Solution

  • If the exception handler is entered with ds and es already set to the data segment, it makes no difference except for maybe a microsecond of delay. Exception handlers don't usually need to be fast.

    But what might cause going to the exception handler? Could it have been because a bad value was loaded into a segment register and then referenced? In such cases it is important for the code to establish a safe environment. cs is set by the exception invocation. To be bulletproof, ss and esp should be set up too.


    Followup:

    Looking at the 2-6.22.18 kernel for i386, I don't see exactly that:

    error_code:   /* the function address is in %fs's slot on the stack */
         pushl  %es
         ...  pushes %ds, %eax, %ebp, %edi, %esi, %edx, %ecx, %ebx, %fs
         ...  along with pseudo-ops to manage stack frame layout
         movl  $(__KERNEL_PERCPU), %ecx
         movl  %ecx, %fs
         popl  %ecx   // retrieves saved %fs
         ... sets up registers for the exception function
    

    The symbol __KERNEL_PERCPU is a macro defined (in include/asm-i386/segment.h) as 0 for non-SMP machines and (GDT_ENTRY_PERCPU * 8) for SMPs. The 8 is for the GDT entry size (I think) and the GDT_ENTRY_PERCPU relates to the entries in the per-CPU GDT. Its value is <base> + 15 which the comments indicate is "default user DS", so it is, in fact, the same thing.

    The kernel data segment is accessed through fs and ss. Much kernel data access is on the stack. By keeping the user mode descriptors accessed through ds, very little loading of segment registers is needed.