I tried to fit an implementation of NSA's SPECK in a 8-bit PIC microcontroller. The free version of their compiler (based on CLANG) won't enable optimizations so I ran out of memory. I tried the "trial" version that enables -O2, -O3 and -Os (optimize for size). With -Os It managed to fit my code in the 2K program memory space.
Here's the code:
#include <stdint.h>
#include <string.h>
#define ROR(x, r) ((x >> r) | (x << (32 - r)))
#define ROL(x, r) ((x << r) | (x >> (32 - r)))
#define R(x, y, k) (x = ROR(x, 8), x += y, x ^= k, y = ROL(y, 3), y ^= x)
#define ROUNDS 27
void encrypt_block(uint32_t ct[2],
uint32_t const pt[2],
uint32_t const K[4]) {
uint32_t x = pt[0], y = pt[1];
uint32_t a = K[0], b = K[1], c = K[2], d = K[3];
R(y, x, a);
for (int i = 0; i < ROUNDS - 3; i += 3) {
R(b, a, i);
R(y, x, a);
R(c, a, i + 1);
R(y, x, a);
R(d, a, i + 2);
R(y, x, a);
}
R(b, a, ROUNDS - 3);
R(y, x, a);
R(c, a, ROUNDS - 2);
R(y, x, a);
ct[0] = x;
ct[1] = y;
}
Unfortunately, when debugging it line by line, comparing it to the test vectors in the implementation guide, from page 32, "15 SPECK64/128 Test Vectors", the results difer from the expected results.
Here's a way to call this function:
uint32_t out[2];
uint32_t in[] = { 0x7475432d, 0x3b726574 };
uint32_t key[] = { 0x3020100, 0xb0a0908, 0x13121110, 0x1b1a1918 };
encrypt_block(out, in, key);
assert(out[0] == 0x454e028b);
assert(out[1] == 0x8c6fa548);
The expected value for "out", according to the guide, should be 0x454e028b, 0x8c6fa548
.
The result I'm getting with -O2 is 0x8FA3FED7 0x53D8CEA8
.
With -O1, I get 0x454e028b, 0x8c6fa548
, which is the correct result.
Step Debugging
The implentation guide includes all the intermediate key schedule other values, so I stepped through the code line by line, comparing the results to the guide.
The expected results for "x" are: 03020100
, 131d0309
, bbd80d53
, 0d334df3
. I start step debugging, but when reaching the 4th result, 0d334df3
, the debugger window shows 0d334df0
instead. By the next round, the expected 7fa43565
value is 7FA43578
and only gets worse with every iteration.
This only happens when -O2 or greater is enabled. With no optimizations, or with -O1, the code works as expected.
It was a bug in the compiler.
I posted the question in the manufacturer's forum. Other people have indeed reproduced the issue, which happens when compiling for certain parts. Other parts are unaffected.
As a workaround, I changed the macros into real functions, and split the operation in two lines:
uint32_t ROL(uint32_t x, uint8_t r) {
uint32_t intermedio;
intermedio = x << r;
intermedio |= x >> (32 - r);
return intermedio;
}
This gives the correct result.