Search code examples
mathcryptographydiscrete-mathematicscode-complexitymontgomery-multiplication

Montgomery Multiplication - 32-bit register vs. 64-bit register


I need to calculate the speed difference between performing a Montgomery Multiplication page 602-603 with a word-size/register of size 32 vs. 64.

So far, this is what I understand:

  • x and y are represented by multiple-word arrays of length n where n = m/w and w is the register size (either 32 or 64).
  • The total number of single-digit multiplications in Montgomery multiplication is n*(2 + 2*n), where n represents the number length of the word-arrays.
  • I will assume that the multiplication of two single-digit takes 1 clock cycle on each of the computers.

How can I put all this together to represent the number of clock cycles needed in Montgomery multiplication on a computer with a 32-bit register or 64-bit register?


Solution

  • The number of cycles for a multiple-precision Montgomery multiplication would indeed be n(2+2*n) if all the intermediate single-precision multiplication operands and results were available in registers. For cryptographic operations this is hardly possible since m is usually 1024 or larger. Assuming 32-bit registers (xyR^-1 mod m) would require 192 registers only to store the operands (3*(1024/32)). In fact you need to take into account memory accesses to answer this question.

    A rewrite of the algorithm with memory accesses (assuming multiplications can be done in parallel with loads/stores):

    1. For i from 0 to n: a_i <- 0
    2. For i from 0 to (n − 1) do the following:
      1. Fetch a_0
      2. Fetch y_0
      3. Fetch x_i
      4. Compute u_i <- (a_0 + x_i*y_0)m' mod b. Store u_i in a register
      5. c = 0 (Computing A <- (A + x_i*y + u_i*m)/b)
      6. for j from 0 to (n-1):
        1. Fetch a_j
        2. Fetch y_j
        3. Compute (cv) = a_j + x_i*y_j + c, Fetch m_j
        4. Compute (cv) = (cv) + u_i*m_j, if j>0 Store a_{j-1} <- v
      7. Store a_n <- c and a_{n-1} <- v
    3. If A >= m then A <- A − m.
    4. Return(A).

    Hope this helps.