Incorrect calculation C uint64_t

Any suggestion why? The c-code in different implementations gives a different value(788f156dbbc97800 or 788f156dbbc87900 see example), manual calculation and implementation on the FPGA gives a "correct" value(788f156dbbc87900), but the source software does not require the correct one(788f156dbbc97800). Interested mechanics Occurrences and, if possible, implementation examples Verilog / VHDL

C = 788f156dbbc97800 FPGA = 788f156dbbc87900 Calc = 788F156DBBC87900

static inline uint64_t rotr64( const uint64_t w, const unsigned c ){
    return ( w >> c ) | ( w << ( 64 - c ) );
}

/*Blake's G function*/
#define G(r,i,a,b,c,d) \
  do { \
    a = a + b; \
    d = rotr64(d ^ a, 32); \
    c = c + d; \
    b = rotr64(b ^ c, 24); \
    a = a + b; \
    d = rotr64(d ^ a, 16); \
    c = c + d; \
    b = rotr64(b ^ c, 63); \
  } while(0)


/*One Round of the Blake's 2 compression function*/
#define ROUND_LYRA(r)  \
    G(r,0,v[ 0],v[ 4],v[ 8],v[12]); \
    G(r,1,v[ 1],v[ 5],v[ 9],v[13]); \
    G(r,2,v[ 2],v[ 6],v[10],v[14]); \
    G(r,3,v[ 3],v[ 7],v[11],v[15]); \
    G(r,4,v[ 0],v[ 5],v[10],v[15]); \
    G(r,5,v[ 1],v[ 6],v[11],v[12]); \
    G(r,6,v[ 2],v[ 7],v[ 8],v[13]); \
    G(r,7,v[ 3],v[ 4],v[ 9],v[14]);


inline static void reducedBlake2bLyra(uint64_t *v) {
    ROUND_LYRA(0);
}


int main()
{
uint64_t *state = malloc(16 * sizeof (uint64_t));

 state[0]  = 0x886405bc4ef729f4;
 state[1]  = 0xeef412028f17fe52;
 state[2]  = 0xc14af5d9d8c1b0d6;
 state[3]  = 0xb4bf0fb0007f7cd8;
 state[4]  = 0x7814c3ff1e1e6584;
 state[5]  = 0x0198a05583c8a31a;
 state[6]  = 0x495a3b6304587341;
 state[7]  = 0x6489e4d1e286df36;
 state[8]  = 0x42c008c5e5f0b5b8;
 state[9]  = 0x81473c472e5c1272;
 state[10] = 0x6ee801e3f691cc77;
 state[11] = 0x3c4a0a05167955f4;
 state[12] = 0x8310219b03708b66;
 state[13] = 0x6bb0801460ab97ea;
 state[14] = 0x13272757d8f7e5fe;
 state[15] = 0x3524a4286f596d06;
    reducedBlake2bLyra(state);

    return 0;
}

code example to Play

http://coliru.stacked-crooked.com/a/2838316d049a2c13 - 788f156dbbc97800

http://coliru.stacked-crooked.com/a/92024213ca202525 - 788f156dbbc87900

Solution

You have different endianness (byte order) in the objects.

In the code at your second link, you define unsigned long long a = 0xf429f74ebc056488, b = 0x84651e1effc31478; and add them. Observe that the low bytes 88 and 78 sum to 100, thus carrying a bit into the next byte, so that 64 and 14 sum to 78, which becomes 79 when the carry is added. Note that the low byte appears last in this source code.

In the code at your first link, you display the objects byte-by-byte, showing the low bytes first. What the code writes as f429f74ebc056488 is actually 0x886405bc4ef729f4 in the uint64_t, and what it writes as 84651e1effc31478 is actually 0x7814c3ff1e1e6584.

When you reverse the bytes in the numbers at the second link, the results match the code in the first link.