algorithm hash cryptography reverse-engineering

Reverse SHA-256 sigma0 function within complexity of O(n)?

Introduction

As a part of SHA-256 hashing algorithm, there's a function that is often being referred as σ1, or sigma0 for convenience. Basically, it takes X as input, where X is 32-bit unsigned value. Then converts it like this:

ROTATE_RIGHT(X, 7) ^ ROTATE_RIGHT(X, 18) ^ SHIFT_RIGHT(X, 3)

A bit of explanation, if you need one:

ROTATE_RIGHT(X, Y) - rotates X's bits to the right by Y
SHIFT_RIGHT(X, Y) - shifts X's bits to the right by Y, so the first Y bits of the result are always 0

Also, if you need the code, here's the full version in Python:

def rotate_right(x, y):
    return (((x & 0xffffffff) >> (y & 31)) | (x << (32 - (y & 31)))) & 0xffffffff

def shift_right(x, n):
    return (x & 0xffffffff) >> n

def sigma0(x):
    return rotate_right(x, 7) ^ rotate_right(x, 18) ^ shift_right(x, 3)

Reverse function

I started wondering if that thing is reversible, and, to my surprise, it didn't take long to write a function which, by given sigma0's output, returns input of that function, or, simply put, reverses sigma0 function. I won't put the code here, because it was written in Node.js and modified a lot by more complex needs of searching particular sigma0 inputs by masks, but I'd like to give you a basic idea of how I solved it, so maybe you could enlighten me with some fresh ideas on how to achieve what I need.

My solution is simple, but it is also recursive. We know that every output's bit is the result of XOR operation of two or three input's bits. So I made a dependence table so that I can see how are output's bits are being affected by input ones:

I:  00,01,02,03,04,05,06,07,08,09,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31

R7  25,26,27,28,29,30,31,00,01,02,03,04,05,06,07,08,09,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24
R18 14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,00,01,02,03,04,05,06,07,08,09,10,11,12,13
S3  zz,zz,zz,00,01,02,03,04,05,06,07,08,09,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28
---------------------------------------------------------------------------------------------------
O:  00,01,02,03,04,05,06,07,08,09,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31

What is this thing about? Say, in the output's 1st bit we have 1. For convenience, I'll write it as O[0], O[1], ... O[31], so O[x] is (x+1)th bit of the output. The same for input, marked as I.

So, O[0] == 1. In the table above we see that O[0] is the result of XOR operation of I[25] and I[14]. Which implies that one and only one of those input's bits must be 1. So at this point we could say that we can create two suitable masks for the input:

##############0#########1#######
##############1#########0#######

Those masks are key to the solution, at least, to mine. # means any value (0 or 1). When we create masks, we call the recursive function for the next bit, but preserving the mask. If we are out of possible masks that would suit previous mask, the previous mask has no solution, and if we reach 32nd bit, we're guaranteed to have no sharps in the masks, and this will be the answer.

First, I need to tell you that this thing works. But on Node.js it calculates every value for about 100ms and I have no idea what is the worst complexity of my recursion algorithm, because it's quite hard to measure. It doesn't satisfy me, and I broke my brains trying to solve this O(n).

Problem

I'm wondering if it's possible to write a function that reverses sigma0 within complexity of O(n), where n is amount of bits in the input/output and it equals to 32, without recursion, masks or trees, simply and fast.

I haven't concluded any mathematical proof for my statement, but I tested lots of different values and I can confidently claim that the amount of input values is equal to the amount of output values of this function, and both are equal to 2^32 - 1. In other words, for every output, there is one and only one possible input of sigma0 function.

Which gives me a thought that the fact sigma0 original function produces result with complexity of O(n) implies that the reverse function must have a solution that also works O(n).

If you mathematically prove me that this is impossible, I'd also accept this answer, but I haven't found anything that would indicate the impossibility of this task.

Resource-devouring workaround

In case I had free 16gb of ram, I'd be able to pre-calculate all the possible values into the file, and then load it into ram as a huge array. But it's not a solution since there are other 3 similar functions, and to do that for all of them I'd need 64gb of ram which is too expensive and excessive for this simple task.

UPD: Gaussian Elimination

Thanks to Artjom B.'s comment, I found a great way to solve XOR equations via Gaussian Elimination. Currently I'm trying to solve a matrix like this:

Input:  00000000100110101000111011101001
Output: 01110001101010000010010011100110

0:  0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 | 0
1:  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 | 1
2:  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 | 1
3:  1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 | 1
4:  0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 | 0
5:  0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 | 0
6:  0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 | 0
7:  1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 | 1
8:  0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 | 1
9:  0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 | 0
10: 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 | 1
11: 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 | 0
12: 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 | 1
13: 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 | 0
14: 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 | 0
15: 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 | 0
16: 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 | 0
17: 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 | 0
18: 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 1
19: 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0
20: 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0
21: 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 | 1
22: 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 | 0
23: 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 | 0
24: 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 | 1
25: 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 | 1
26: 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 | 1
27: 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 | 0
28: 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 | 0
29: 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 | 1
30: 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 | 1
31: 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 | 0

Published the matrix so you could see how it looks and not waste your time on creating it by yourself. I'll update my question once I solved it or not.

Solution

If we view sigma0 as a function over a GF(2)³² vector, you will note that it's linear. Addition in GF(2)³² is just the binary XOR:

>>> sigma0(235 ^ 352124)
2045075788

>>> sigma0(235) ^ sigma0(352124)
2045075788

This means that if we can find sigma0(x0) = 0b1, sigma0(x1) = 0b10, etc we can easily invert anything bit-by-bit. We can easily find these inverses with z3:

import z3

def z3_sigma0(x):
    return z3.RotateRight(x, 7) ^ z3.RotateRight(x, 18) ^ z3.LShR(x, 3)

s = z3.Solver()
xs = [z3.BitVec(f"x{i}", 32) for i in range(32)]
for i in range(32):
    s.add(z3_sigma0(xs[i]) == (1 << i))
print(s.check())
m = s.model()
for i in range(32):
    print("x{:02} = 0x{:08x}".format(i, m[xs[i]].as_long()))

This instantly outputs:

sat
x00 = 0x185744e9
x01 = 0x30ae89d2
x02 = 0x615d13a4
x03 = 0xdaed63a1
x04 = 0x9cd03a8e
x05 = 0x08fdcc39
x06 = 0x11fb9872
x07 = 0x23f730e4
x08 = 0x5fb92521
x09 = 0xbf724a42
x10 = 0x57ee6948
x11 = 0xafdcd290
x12 = 0x76b358ec
x13 = 0xf531f531
x14 = 0xc36917ae
x15 = 0xb78f9679
x16 = 0x4615d13e
x17 = 0x947ce695
x18 = 0x19a4740f
x19 = 0x2b1facf7
x20 = 0x4e681d07
x21 = 0x84877ee7
x22 = 0x385344eb
x23 = 0x70a689d6
x24 = 0xf91a5745
x25 = 0xc36917af
x26 = 0xb78f967b
x27 = 0x4615d13a
x28 = 0x8c2ba274
x29 = 0x290afdcd
x30 = 0x4a42bf73
x31 = 0x94857ee6

Thus we can use this to make our inversion function:

sigma0_singleton_inverses = [
    0x185744e9, 0x30ae89d2, 0x615d13a4, 0xdaed63a1, 0x9cd03a8e, 0x08fdcc39,
    0x11fb9872, 0x23f730e4, 0x5fb92521, 0xbf724a42, 0x57ee6948, 0xafdcd290,
    0x76b358ec, 0xf531f531, 0xc36917ae, 0xb78f9679, 0x4615d13e, 0x947ce695,
    0x19a4740f, 0x2b1facf7, 0x4e681d07, 0x84877ee7, 0x385344eb, 0x70a689d6,
    0xf91a5745, 0xc36917af, 0xb78f967b, 0x4615d13a, 0x8c2ba274, 0x290afdcd,
    0x4a42bf73, 0x94857ee6
]

def inv_sigma0(x):
    r = 0
    for i in range(32):
        if x & (1 << i):
            r ^= sigma0_singleton_inverses[i]
    return r

And indeed:

>>> def test_inv_once():
...     r = random.randrange(2**32)
...     return inv_sigma0(sigma0(r)) == r
>>> all(test_inv_once() for _ in range(10**6))
True

The above can be written completely loopless and branchless:

def inv_sigma0(x):
    xn = ~x
    r  = (((xn >>  0) & 1) - 1) & 0x185744e9
    r ^= (((xn >>  1) & 1) - 1) & 0x30ae89d2
    r ^= (((xn >>  2) & 1) - 1) & 0x615d13a4
    r ^= (((xn >>  3) & 1) - 1) & 0xdaed63a1
    r ^= (((xn >>  4) & 1) - 1) & 0x9cd03a8e
    r ^= (((xn >>  5) & 1) - 1) & 0x08fdcc39
    r ^= (((xn >>  6) & 1) - 1) & 0x11fb9872
    r ^= (((xn >>  7) & 1) - 1) & 0x23f730e4
    r ^= (((xn >>  8) & 1) - 1) & 0x5fb92521
    r ^= (((xn >>  9) & 1) - 1) & 0xbf724a42
    r ^= (((xn >> 10) & 1) - 1) & 0x57ee6948
    r ^= (((xn >> 11) & 1) - 1) & 0xafdcd290
    r ^= (((xn >> 12) & 1) - 1) & 0x76b358ec
    r ^= (((xn >> 13) & 1) - 1) & 0xf531f531
    r ^= (((xn >> 14) & 1) - 1) & 0xc36917ae
    r ^= (((xn >> 15) & 1) - 1) & 0xb78f9679
    r ^= (((xn >> 16) & 1) - 1) & 0x4615d13e
    r ^= (((xn >> 17) & 1) - 1) & 0x947ce695
    r ^= (((xn >> 18) & 1) - 1) & 0x19a4740f
    r ^= (((xn >> 19) & 1) - 1) & 0x2b1facf7
    r ^= (((xn >> 20) & 1) - 1) & 0x4e681d07
    r ^= (((xn >> 21) & 1) - 1) & 0x84877ee7
    r ^= (((xn >> 22) & 1) - 1) & 0x385344eb
    r ^= (((xn >> 23) & 1) - 1) & 0x70a689d6
    r ^= (((xn >> 24) & 1) - 1) & 0xf91a5745
    r ^= (((xn >> 25) & 1) - 1) & 0xc36917af
    r ^= (((xn >> 26) & 1) - 1) & 0xb78f967b
    r ^= (((xn >> 27) & 1) - 1) & 0x4615d13a
    r ^= (((xn >> 28) & 1) - 1) & 0x8c2ba274
    r ^= (((xn >> 29) & 1) - 1) & 0x290afdcd
    r ^= (((xn >> 30) & 1) - 1) & 0x4a42bf73
    r ^= (((xn >> 31) & 1) - 1) & 0x94857ee6
    return r

The fastest version probably probably is this one, grouping by 16 bits at a time using a 2 × 2¹⁶ size lookup table (or similarly four lookups into a 4 × 2⁸ sized table).

sigma0_16bit_inverse_lo = [inv_sigma0(x)       for x in range(2**16)]
sigma0_16bit_inverse_hi = [inv_sigma0(x << 16) for x in range(2**16)]
def fast_inv_sigma0(x):
    return (sigma0_16bit_inverse_lo[x & 0xffff] ^
            sigma0_16bit_inverse_hi[(x >> 16) & 0xffff])