Search code examples
pythonpolynomials

Shamir's secret sharing(Python) - Unable to get back orginal secret when it's len > 10


Whenever I tried to get back the original value, I'll get "Hello World►/", "Hello Worlc", "Hello World►/", "Hello Worle♦S" or some other variants. I follow from this website for the implementation of splitting and reconstruction: https://www.geeksforgeeks.org/implementing-shamirs-secret-sharing-scheme-in-python/. This is my code:

import random
from math import ceil
from decimal import Decimal

FIELD_SIZE = 10**5

def reconstruct_secret(shares):
    sums = Decimal(0)
    for j, share_j in enumerate(shares):
        xj, yj = share_j
        prod = Decimal(1)
        for i, share_i in enumerate(shares):
            xi, _ = share_i
            if i != j:
                # Check if xi is equal to xj
                if xi == xj:
                    continue
                prod *= Decimal(xi) / (Decimal(xi) - Decimal(xj))

        prod *= Decimal(yj)
        sums += prod

    return int(round(sums))


def polynom(x, coefficients):
    point = 0
    for coefficient_index, coefficient_value in enumerate(coefficients[::-1]):
        point += x ** coefficient_index * coefficient_value
    return point


def coeff(t, secret):
    coeff = [random.randrange(0, FIELD_SIZE) for _ in range(t - 1)]
    coeff.append(secret)
    return coeff


def generate_shares(n, m, secret):
    coefficients = coeff(m, secret)
    shares = []

    for i in range(1, n+1):
        x = random.randrange(1, FIELD_SIZE)
        shares.append((x, polynom(x, coefficients)))

    return shares

if __name__ == '__main__':

    # (3,5) sharing scheme
    t, n = 3, 5

    tt = "Hello World!!!"
    secret = int.from_bytes(tt.encode(), byteorder='big')
    # Phase I: Generation of shares
    shares = generate_shares(n, t, secret)

    # Phase II: Secret Reconstruction
    pool = random.sample(shares, t)
    # Reconstruct the secret
    r_secret_int = reconstruct_secret(pool)
    r_secret_string = r_secret_int.to_bytes(
        (r_secret_int.bit_length() + 7) // 8, byteorder='big').decode('utf-8', 'ignore')
    print("Reconstructed secret:", r_secret_string)

I consulted chatgpt, it suggested changing the algorithm for reconstruct_secret(shares) or increasing the FIELD_SIZE but the code it gave, the secret either returned blank or some gibberish.


Solution

  • TL;DR: reconstruct_secret is buggy. It won't work for certain cases. It works for the approximation where secret is small, but if secret is large, it struggles to reconstruct the values from shares. I'll explain why here.


    Let's recap how this implementation of SSS works. Assuming that one has a polynomial of the form:

    P(x, n-1) = S + a*x + b*x^2 + ... c*x^(n-1)
    

    then, the coefficients a, b, ..., c can be found if you have the value of P and x at exactly n places on the curve. This is the Lagrange interpolation theorem. You can see more about this on the wikipedia page: https://en.wikipedia.org/wiki/Shamir%27s_secret_sharing

    The code smell is the rounding + use of Decimal:

    def reconstruct_secret(shares):
        sums = Decimal(0)
        for j, share_j in enumerate(shares):
            xj, yj = share_j
            prod = Decimal(1)
            for i, share_i in enumerate(shares):
                xi, _ = share_i
                if i != j:
                    # Check if xi is equal to xj
                    if xi == xj:
                        continue
                    prod *= Decimal(xi) / (Decimal(xi) - Decimal(xj))
    
            prod *= Decimal(yj)
            sums += prod
    
        return int(round(sums))
    

    When we generate shares, we do that by picking a random polynomial. We do that by randomly picking coefficients between 0 and FIELD_SIZE. The division here is not 100% accurate:

                    prod *= Decimal(xi) / (Decimal(xi) - Decimal(xj))
    

    Why does this not work?

    Thinking about this like pixelation

    If FIELD_SIZE << secret, the coefficients are very insignificant. Think of it this way:

    P(x, 2) = 1000000000000000 + 2x + 3x^2
    

    This polynomial is wholly dominated by the secret, 1000000000000000. It's very hard to find the exact shape of the curve, because it is "mostly flat" in the vicinity of x = 0. We use the x = 0 value to find the secret. You can see this as follows: the derivative of P(x, n) is much much less than the value at P(0, n). That means you need a high resolution solution to find the answer. Small errors will "corrupt" the data.

    Rounding error

    When you reconstruct the secret, the algorithm you use has rounding error. That's because when you do a/b, you don't get an exact answer. You get an answer that is approximately correct. Try it yourself:

    >>> 1/3
    0.3333333333333333
    

    Those 3s actually keep going forever. See this bit which corresponds to the basis polynomials:

    prod *= Decimal(xi) / (Decimal(xi) - Decimal(xj))
    

    To do that accurately, (xi - xj) shouldn't be "too big". If it is much much bigger than xi, we'll exceed the accuracy for representing decimals.

    So, if you have P(x) = 1000000000000000 + 2x + 3x^2, these coefficients vary a lot, and therefore, the rounding error for division dominates your result, and corrupts the secret.

    This problem is essentially catastrophic cancellation.

    The fix

    The easiest fix is to use a different algorithmic approach. The crux of the problem is that reconstruct_secret is error prone for large values of secret. The shares are fine, but that function won't accurately reconstruct the secret.

    See here for alternative approaches: https://en.wikipedia.org/wiki/Shamir%27s_secret_sharing#Python_code