Search code examples
benchmarkingsmalltalkpharovisualworksgnu-smalltalk

Efficency of assign-and-compare in the same statement in Smalltalk


A previous SO question raised the issue about which idiom is better in time of execution efficency terms:

[ (var := exp) > 0 ] whileTrue: [ ... ]

versus

[ var := exp. 
  var > 0 ] whileTrue: [ ... ]

Intuitively it seems the first form could be more efficient during execution, because it saves fetching one additional statement (second form). Is this true in most Smalltalks?

Trying with two stupid benchmarks:

| var acc |
var := 10000.
[ [ (var := var / 2) < 0  ] whileTrue: [ acc := acc + 1 ] ] bench.

| var acc |
var := 10000.
[ [ var := var / 2. var < 0  ] whileTrue: [ acc := acc + 1 ] ] bench

Reveals no major differences between both versions.

Any other opinions?


Solution

  • So the question is: What should I use to achieve a better execution time?

    temp := <expression>.
    temp > 0
    

    or

    (temp := <expression>) > 0
    

    In cases like this one, the best way to arrive at a conclusion is to go down one step in the level of abstraction. In other words, we need a better understanding of what's happening behind the scenes.

    The executable part of a CompiledMethod is represented by its bytecodes. When we save a method, what we are doing is compiling it into a series of low level instructions for the VM to be able to execute the method every time it is invoked. So, let's take a look at the bytecodes of each one of the cases above.

    Since <expression> is the same in the same in both cases, let's reduce it drastically to eliminate noise. Also, let's put our code in a method so to have a CompiledMethod to play with

    Object >> m
      | temp |
      temp := 1.
      temp > 0
    

    Now, let's look CompiledMethod and its superclasses for some message that would show us the bytecodes of Object >> #m. The selector should contain the subword bytecodes, right?

    ...

    Here it is #symbolicBytecodes! Now let's evaluate (Object >> #m) symbolicBytecodes to get:

    pushConstant: 1
    popIntoTemp: 0
    pushTemp: 0
    pushConstant: 0
    send: >
    pop
    returnSelf
    

    Note by the way how our temp variable has been renamed to Temp: 0 in the bytecodes language.

    Now repeat with the other and get:

    pushConstant: 1
    storeIntoTemp: 0
    pushConstant: 0
    send: >
    pop
    returnSelf
    

    The difference is

    popIntoTemp: 0
    pushTemp: 0
    

    versus

    storeIntoTemp: 0
    

    What this reveals is that in both cases temp is read from the stack in different ways. In the first case, the result of our <expression> is popped into temp from the execution stack and then temp is pushed again to restore the stack. A pop followed by a push of the same thing. In the second case, instead, no push or pop happens and temp is simply read from the stack.

    So the conclusion is that in the first case we will be generating two cancelling instructions pop followed by push.

    This also explains why the difference is so hard to measure: push and pop instructions have direct translations into machine code and the CPU will execute them really fast.

    Note however, that nothing prevents the compiler to automatically optimize the code and realize that in fact pop + push is equivalent to storeInto. With such an optimization both Smalltalk snippets would result in exactly the same machine code.

    Now, you should be able to decide which form do you prefer. I my opinion such a decision should only take into account the programming style that you like better. Taking into consideration the execution time is irrelevant because the difference is minimal, and could be easily reduced to zero by implementing the optimization we just discussed. By the way, that would be an excellent exercise for those willing to understand the low level realms of the unparalleled Smalltalk language.