I am trying to implement SHA1 with SSE2 instructions in C. The initialization seems to work, but if i try
round1(testhashe, testhasha, testhashb, testhashc, testhashd, loadConstant(b[z]));
as first round of my algorithm, i get errors. Constants and Values before are checked to be right, but the last value will be wrong. My Macros are
#define rotthirty(val) (_mm_or_si128(_mm_slli_epi32(val,30),_mm_srli_epi32(val,2)))
#define f1(b,c,d) (_mm_xor_si128(d,_mm_and_si128(b, _mm_xor_si128(c, d))))
// Round functions
#define round1(A,B,C,D,E,w) \
temp = rotthirty(A);\
temp = _mm_add_epi32(temp,f1(B, C, D));\
temp = _mm_add_epi32(temp,k1);\
temp = _mm_add_epi32(temp,w);\
E = _mm_add_epi32(temp, E);\
B = rotthirty(B);\
These worked before i changed to the SSE2 functions without problems, i just changed the operators to functions. What am i doing wrong?
Output after this function from with intrinsics and 4 sha calculations at a time
Vector: 67452301 67452301 67452301 67452301
Vector: 7bf36ae2 7bf36ae2 7bf36ae2 7bf36ae2
Vector: 98badcfe 98badcfe 98badcfe 98badcfe
Vector: 10325476 10325476 10325476 10325476
Vector: 734fe2b5 724fe2b5 8b4ee2b5 8a4ee2b5
which except the last line contains the right values, as can be seen in executing SSE2 free working code after Round1
67452301
7bf36ae2
98badcfe
10325476
122fa21
Instead of rot 30 it had to be rot 5 of A. But also, if someone else faces this problem i wanted to adress @jww answer, as it seems to be a missconception i heard some times. If you use only SSE2 intrinsics, you can not use those mentioned SHA functions, as those were not part of this. You do not have to swap the byte pattern by loading the values into the vectors, this can stay as shown above