Search code examples
javaarraysalgorithmoptimization

Karatsuba multiplication optimization on byte array


I have implemented the multiplication for two byte arrays and it works fine. More precisely, I need to multiply a 64 bytes operand with a 32 bytes one.

I implemented it the easiest way: In two nested loops I compute the product for each byte in each array. So for the specific values, it takes 64 * 32 = 2048 steps.

I tried to optimize it with the Karatsuba method. So I proceed the following way:

a has a length of 64 bytes and b has a length of 32 bytes.

We split a in: a = p * 16^(32) + q (so p and q have both a length of 32 bytes) and compute: a * b = p * b * 16^(32) + q * b (products are compute with my function described before).

I get the right result but it takes the same time to compute: 2 multiplications of two 32 bytes arrays: 32 * 32 * 2 = 64 * 32 = 2048.

To optimize my multiplication using Karatsuba, should I program it totally recursively?
Otherwise it will never be faster?


Solution

  • Yes, the Karatsuba algorithm is only effective if you're doing it recursively. But remember: Karatsuba is not always faster than the simple algorithm, that takes O(n^2) (usually we assume that both numbers has the same length, if we're going to multiply large numbers). For small inputs (it can be 1, it can be 15, depending on your CPU) the simple algorithm can be faster, so the optimal use of Karatsuba is:

    1. if size > MIN_SIZE_FOR_KARATSUBA (you have to find it experimentally), then do the split and recursively call the Karatsuba.
    2. If size <= MIN_SIZE_FOR_KARATSUBA, then just multiply them with simple algorithm.

    And also, don't split your array into bytes in multiplication, store them in integers in base 1000 or something like that, because you can overflow your type easily.

    The best implementation of Karatsuba's algorithm is described in this link. Usually Karatsuba takes O(n log n) memory, but here with some tricks it takes only O(n) memory.

    If you don't want to use function calls many times (because function calls is the slowest operation in programming), then you can use loops and implement a stack by yourself, as in my implementation.