in this code:
#define G(gi1, gi2, x, t0, t1, t2, t3) \
lookup_32bit(t0, t1, t2, t3, ##gi1, RGS1, shr_next, ##gi1); \
lookup_32bit(t0, t1, t2, t3, ##gi2, RGS3, shr_next, ##gi2); \
\
lookup_32bit(t0, t1, t2, t3, ##gi1, RGS2, dummy, none); \
shlq $32, RGS2; \
orq RGS1, RGS2; \
lookup_32bit(t0, t1, t2, t3, ##gi2, RGS1, dummy, none); \
shlq $32, RGS1; \
orq RGS1, RGS3;
#define lookup_32bit(t0, t1, t2, t3, src, dst, interleave_op, il_reg) \
movzbl src ## bl, RID1d; \
movzbl src ## bh, RID2d; \
shrq $16, src; \
movl t0(CTX, RID1, 4), dst ## d; \
movl t1(CTX, RID2, 4), RID2d; \
movzbl src ## bl, RID1d; \
xorl RID2d, dst ## d; \
movzbl src ## bh, RID2d; \
interleave_op(il_reg); \
xorl t2(CTX, RID1, 4), dst ## d; \
xorl t3(CTX, RID2, 4), dst ## d;
"gi1" becomes RDX, in the beginning, but furthermore I can't translate it regard of its usage in the "movzbl" instruction. Basically I can't figure out the movzbl ??? ???, RID1d I am NASM user
full code here: https://github.com/torvalds/linux/blob/master/arch/x86/crypto/twofish-avx-x86_64-asm_64.S
I'm still slightly confused by the uses of ##
in G
. I found a section of the GNU cpp manual which mentions ##
after a comma, but it's meant for use in variadic macros, and this isn't one of those.
But I'm going ahead with an explanation anyway, based on the assumption that those ##
are not doing anything.
The ##
in lookup_32bit
, on the other hand, are perfectly normal and necessary.
Let's go up a level from the G
macro and see how it's called. One of its calls looksl ike this:
G(RGI1, RGI2, x1, s0, s1, s2, s3)
Its first argument, RGI1
, becomes gi1
in the expansion. The first piece of the G
macro:
lookup_32bit(t0, t1, t2, t3, ##gi1, RGS1, shr_next, ##gi1)
expands lookup_32bit
with ##gi1
as the 5th and 8th arguments. I'm assuming ##gi1
works the same as gi1
, so the 5th and 8th arguments will be RGI1
.
Inside the lookup_32bit
macro, the 5th and 8th arguments are called src
and il_reg
, so both of those will expand to RGI1
in this instance. The first instruction in lookup_32bit
:
movzbl src ## bl, RID1d;
pastes the src
argument (RGI1
) together with bl
(which is not a macro or a macro argument, so it just represents itself), resulting in the pasted token RGI1bl
. The instruction now looks like this:
movzbl RGI1bl, RID1d;
After the first pass of expanding lookup_32bit
is done, the preprocessor will look again for macros to expand, and RGI1bl
is a macro defined like this:
#define RGI1bl %dl
Also, RID1d
is a macro defined like this:
#define RID1d %ebp
so the instruction ends up being:
movzbl %dl, %ebp;
and that's just a zero-extending move from 8-bit register %dl
to 32-bit register `%ebp.
Looking at the other macros, you can see that there are a bunch of them starting with RGI1
all of which resolve to %rdx
or portions of it. With these macros in place, selecting the low 8-bit portion of a 64-bit register can be done by pasting bl
onto the end with ##
, which wouldn't be possible using the native register names directly (there's no preprocessor operation as sophisticated as "remove the r
from the front of this token and change the final x
to an l
).
The specific names RGI1
, RID1
, etc. don't look familiar to me. I'll guess they are derived from the twofish specification.
Token-pasting reference: http://gcc.gnu.org/onlinedocs/cpp/Concatenation.html#Concatenation