Search code examples
tcl

Performing XOR to unmask web-socket data frames cu* versus I*u*?


I came across this "32-bit integer" method of reading and applying the XOR mask in a web socket and tried it out in my very simple local server for a desktop application and wondered if it should really be ten times quicker.

It is at https://wiki.tcl-lang.org/page/WebSocket+Client+Library?R=0 under the section of code at proc ::websocket::__mask { mask dta }

I modified it very slightly to proc xor32, below, but was not expecting it to be that much faster, maybe 4 times but not 10. The proc xor is the method shown in all the other instructions I could find when first researching how to receive messages over a web socket.

My question is, Is this a good approach and truly expected to be 10 times faster, or am I misinterpreting/misunderstanding the results?

Thank you.

proc xor {mask input} {
  binary scan $mask cu4 mask_key
  binary scan $input cu* pre_xor
  set offset -1
  set post_xor {}
  foreach b $pre_xor {
    append post_xor \
      "[expr {$b ^ [lindex $mask_key [expr {[incr offset] % 4}]]}] "
  }
  return [binary format cu* $post_xor]
}

proc xor32 { mask input } {
  # Format data as a list of 32-bit integer
  # words and list of 8-bit integer byte leftovers.  Then unmask
  # data, recombine the words and bytes, and return
  binary scan $mask Iu mask_key
  binary scan $input I*c* words bytes
  set masked_words {}
  set masked_bytes {}
  foreach word $words {
    lappend masked_words [expr {$word ^ $mask_key}]
  }
  set i -1
  foreach byte $bytes {
    lappend masked_bytes\
       [expr {$byte ^ ($mask_key >> (24 - 8 * [incr i]))}]
  }
  return [binary format I*c* $masked_words $masked_bytes]
}

set filename {book.html}
set fp [open $filename r];
set size [file size $filename]
puts "message size: $size"
set message [chan read $fp]
set maskKeys {100 42 9 67}
# set binmask [binary format I1 $maskKeys]
set binmask [binary format cu4 $maskKeys]

set encoded [xor $binmask $message]
#puts [xor $binmask $encoded]
puts "xor avg time 100 iterations: [time {xor $binmask $encoded} 100]"

set encoded [xor32 $binmask $message]
#puts [xor32 $binmask $encoded]
puts "xor32 avg time 100 iterations: [time {xor32 $binmask $encoded} 100]"

# message size: 1063714
# xor avg time 100 iterations: 299470.44 microseconds per iteration
# xor32 avg time 100 iterations: 28932.12 microseconds per iteration


Solution

  • Count up the number of Tcl commands needed to process each byte in the different approaches and the performance makes sense. Extending the principle to processing 64 bit chunks at a time with xor64 as:

    proc xor64 { mask input } {
      # Format data as a list of 32-bit integer
      # words and list of 8-bit integer byte leftovers.  Then unmask
      # data, recombine the words and bytes, and return
      binary scan $mask Iu mask_key
      binary scan $input W*c* qwords bytes
      set masked_qwords {}
      set masked_bytes {}
      set qmask_key [expr {$mask_key || ($mask_key << 32)}]
      foreach qword $qwords {
        lappend masked_qwords [expr {$qword ^ $qmask_key}]
      }
      set i -1
      foreach byte $bytes {
        lappend masked_bytes\
           [expr {$byte ^ ($qmask_key >> (56 - 8 * [incr i]))}]
      }
      return [binary format W*c* $masked_qwords $masked_bytes]
    }
    

    you can see that the pattern continues to hold, and is roughly proportional to the reduction in the number of Tcl commands:

    message size: 1549
    xor avg time 100 iterations: 285.66 microseconds per iteration 775504 Tcl commands
    xor32 avg time 100 iterations: 25.9 microseconds per iteration 78904 Tcl commands
    xor64 avg time 100 iterations: 11.45 microseconds per iteration 41504 Tcl commands