Any suggestion why?
The c-code in different implementations gives a different value(788f156dbbc97800
or 788f156dbbc87900
see example), manual calculation and implementation on the FPGA gives a "correct" value(788f156dbbc87900
), but the source software does not require the correct one(788f156dbbc97800
).
Interested mechanics Occurrences and, if possible, implementation examples Verilog / VHDL
C = 788f156dbbc97800
FPGA = 788f156dbbc87900
Calc = 788F156DBBC87900
static inline uint64_t rotr64( const uint64_t w, const unsigned c ){
return ( w >> c ) | ( w << ( 64 - c ) );
}
/*Blake's G function*/
#define G(r,i,a,b,c,d) \
do { \
a = a + b; \
d = rotr64(d ^ a, 32); \
c = c + d; \
b = rotr64(b ^ c, 24); \
a = a + b; \
d = rotr64(d ^ a, 16); \
c = c + d; \
b = rotr64(b ^ c, 63); \
} while(0)
/*One Round of the Blake's 2 compression function*/
#define ROUND_LYRA(r) \
G(r,0,v[ 0],v[ 4],v[ 8],v[12]); \
G(r,1,v[ 1],v[ 5],v[ 9],v[13]); \
G(r,2,v[ 2],v[ 6],v[10],v[14]); \
G(r,3,v[ 3],v[ 7],v[11],v[15]); \
G(r,4,v[ 0],v[ 5],v[10],v[15]); \
G(r,5,v[ 1],v[ 6],v[11],v[12]); \
G(r,6,v[ 2],v[ 7],v[ 8],v[13]); \
G(r,7,v[ 3],v[ 4],v[ 9],v[14]);
inline static void reducedBlake2bLyra(uint64_t *v) {
ROUND_LYRA(0);
}
int main()
{
uint64_t *state = malloc(16 * sizeof (uint64_t));
state[0] = 0x886405bc4ef729f4;
state[1] = 0xeef412028f17fe52;
state[2] = 0xc14af5d9d8c1b0d6;
state[3] = 0xb4bf0fb0007f7cd8;
state[4] = 0x7814c3ff1e1e6584;
state[5] = 0x0198a05583c8a31a;
state[6] = 0x495a3b6304587341;
state[7] = 0x6489e4d1e286df36;
state[8] = 0x42c008c5e5f0b5b8;
state[9] = 0x81473c472e5c1272;
state[10] = 0x6ee801e3f691cc77;
state[11] = 0x3c4a0a05167955f4;
state[12] = 0x8310219b03708b66;
state[13] = 0x6bb0801460ab97ea;
state[14] = 0x13272757d8f7e5fe;
state[15] = 0x3524a4286f596d06;
reducedBlake2bLyra(state);
return 0;
}
code example to Play
http://coliru.stacked-crooked.com/a/2838316d049a2c13 - 788f156dbbc97800
http://coliru.stacked-crooked.com/a/92024213ca202525 - 788f156dbbc87900
You have different endianness (byte order) in the objects.
In the code at your second link, you define unsigned long long a = 0xf429f74ebc056488, b = 0x84651e1effc31478;
and add them. Observe that the low bytes 88
and 78
sum to 100
, thus carrying a bit into the next byte, so that 64
and 14
sum to 78
, which becomes 79
when the carry is added. Note that the low byte appears last in this source code.
In the code at your first link, you display the objects byte-by-byte, showing the low bytes first. What the code writes as f429f74ebc056488
is actually 0x886405bc4ef729f4
in the uint64_t
, and what it writes as 84651e1effc31478
is actually 0x7814c3ff1e1e6584
.
When you reverse the bytes in the numbers at the second link, the results match the code in the first link.