Search code examples
encryptionethereumpublic-key-encryption

get 24th word from first 23 words of a bip-0039 mnemonic phrase


Below is the code that takes the first 23 words from a valid mnemonic phrase and is supposed to find the last word aka the checksum word derived from the 23 words.

But it's not coming up with the correct word.

Here is a valid mnemonic phrase: cute door network found clown neither slight common torch tissue project melt bottom marble tunnel aisle kitchen staff only unhappy measure census need miss

When you put in the 23 words the code comes up with the 24th word as "type" which is not correct. I'm using the standard bip39 word list: https://github.com/bitcoin/bips/blob/master/bip-0039/english.txt

def get_checksum_word(words_path: str):
    with open(words_path, 'r') as f:
        word_list = f.read().splitlines()

    # Get the first 23 words of the mnemonic
    mnemonic = input("Enter 23 words separated by spaces: ")
    words = mnemonic.strip().split()
    if len(words) != 23:
        raise ValueError("Invalid number of words in mnemonic")

    # Generate the binary string from the first 23 words
    binary_str = ''
    for word in words:
        index = word_list.index(word) + 1
        print(index)
        binary_str += bin(index)[2:].zfill(11)
    entropy_length = len(binary_str)

    # Calculate the checksum
    entropy_bytes = b''
    for i in range(0, entropy_length, 8):
        byte = int(binary_str[i:i+8], 2).to_bytes(1, 'big')
        entropy_bytes += byte
    checksum = hashlib.sha256(entropy_bytes).digest()[0]
    binary_str += bin(checksum)[2:].zfill(8)

    # Get the index of the 24th word
    index = int(binary_str[-11:], 2)
    word = word_list[index]

    print("The 24th word is:", word)

Solution

  • What you have in mind is not possible. Your mistake lies (probably) in the assumption that the last word is the checksum (...the last word aka the checksum word...). But this is not correct, the checksum is only a part of the last word.

    This is explained in more detail below (s. bip-0039.mediawiki):
    One word corresponds to 11 bits. 24 words then correspond to 24 * 11 = 264 bit. The front 256 bits are the entropy. The last 8 bits are the checksum. The checksum is generated from the first 8 bits of the SHA-256 hash of the entropy.
    With 23 words you have only 23 * 11 = 253 bits, i.e. 3 bits of the entropy are missing. Consequently, the checksum cannot be determined and thus also not the 24th word.


    This can be illustrated with this website, after entering all 24 words and setting the Show entropy details option.

    Alternatively, this can also be illustrated with the posted code. To do this, you first need to fix the index determination: index = word_list.index(word) (instead of word_list.index(word) + 1).
    When this is done, the missing 3 bits of entropy must be appended to the determined binary_str in your code: binary_str += "100" (100 can be derived from the 24th word miss: index 1134 or binary 100 0110 1110; the first 3 bits are the last 3 bits of entropy, the last 8 bits are the checksum).
    Then the code returns the correct word miss as the 24th word.