assembly optimization x86 micro-optimization

Redundant value copying in assembly?

I am learning x86 assembly from the book Programming Ground Up. When introducing functions, the author gives example of a function that raises a given 4 byte integer to a power greater than 0. Here is how the function is defined (this is the version I wrote but code is almost identical):

 1. .type find_pow, @function
 2. find_pow:
 3. pushl %ebp          # Save current base pointer
 4. movl %esp, %ebp     # Copy stack pointer to base pointer
 5. movl $1, %eax        # %eax will hold the result, set it to 1
 6.
 7. subl $4, %esp
 8. movl 8(%ebp), %ebx
10. movl 12(%ebp), %ecx
11.
12. movl %ebx, -4(%ebp)
13.
14. loop_find_pow:      # Start loop
15. cmpl $1, %ecx        # If 2nd parameter equals 0
16. je exit_find_now    # Exit loop
17. movl -4(%ebp), %eax # Move current result to %eax
18. imull %ebx, %eax    # Multiply to current value of %eax
19. movl %eax, -4(%ebp) # Store current result
20. decl %ecx           # Decrease %ecx
21. jmp loop_find_pow   # Jump to loop top
22.
23. exit_find_now:
24. movl -4(%ebp), %eax
25. movl %ebp, %esp
26. popl %ebp
27. ret

I understand the code completely but a little confused about why the author is doing what he's doing in lines 17...19. First, the value of the local variable is copied to %eax from stack. Then the computation is done using %eax (I suppose this is for performance reasons?) and then the calculated value is stored back to space allocated for the only local variable. I fail to understand why all this copying is required. The caller obtains the return value by examining the %eax register anyway. Doesn't it make sense to completely eliminate use of the local variable (at address -4(%ebp)?

Edit: If anyone wants to look at the original code in the book, browse to page 67.

Solution

Doesn't it make sense to completely eliminate use of the local variable (at address -4(%ebp)?

Yes.

Because eax is a volatile register (defined by the ABI), the function need not preserve its original value.

An algorithm this simple can keep all of its state in registers, without having to spill them onto the stack.

However, this is a simple example intended to teach; one of the things it demonstrates is how to use stack variables when you don't have enough registers.