I made a program to try and find the length x
of a key in a French vigenere ciphered text (should be a random string) by calculating the mean of all the index of coincidence of a text which is cut every x element, basically I'm trying to get the mean value of the IC closest to 0.06/0.07 according to this reference website, but my mean values look wrong.
Note: I'm only working with uppercase strings, and no punctuation at all, no special characters and no spaces.
I have 3 functions, one function gives me a list of occurrence letters in a text.
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
def freq(txt):
"""
give back all occurences of letters in a list
"""
hist = [0.0] * len(alphabet)
for letter in txt:
hist[alphabet.index(letter)] += 1
return hist
def IC(occurences):
"""
this function takes a list of occurences of letters from a text and calculates the IC
and returns the IC
"""
sum_ = 0
n = sum(occurences)
if n == 1:
return 0
for occ_letter in occurences:
sum_ += occ_letter * (occ_letter - 1)
sum_ /= n * (n - 1)
return sum_
# These two functions work well (or so I believe), but when I wrote this one :
# This function could be wrong
def key_length(cipher):
"""
cipher is the ciphered text.
Find the length of the key by cutting the text and trying to find a mean value over 0.06
Returns the length of the key
"""
max_key_len = 20 # (the length of the key will never be over 20)
max_ = 0
index_max = 0
for x in range(1, max_key_len + 1): # Going from 1 to 20
pieces = [cipher[i : i + x] for i in range(0, len(cipher), x)]
# calculating the mean value of all the piece in text cut in x pieces
mean_IC = sum(IC(freq(piece)) for piece in pieces) / len(pieces)
print(mean_IC, x)
if mean_IC > max_:
max_ = mean_IC
index_max = x
if mean_IC > 0.06:
break
return index_max
Tracing the mean value of the IC gives me this on this sample text :
0.043117408906882586 20
0.03727725599070627 19
0.03944106378183457 18
0.04047088532382649 17
0.04331597222222223 16
0.04154995331465919 15
0.037422037422037424 14
0.04287251210328133 13
0.037350246652572215 12
0.03882301273605619 11
0.04291938997821349 10
0.04191033138401558 9
0.04185267857142856 8
0.03522504892367906 7
0.03686274509803924 6
0.04554455445544554 5
0.04199475065616799 4
0.043392504930966455 3
0.03557312252964427 2
0.0 1
None of these values are over 0.06 (but some should be since 7 is the real key length). I've looked online at a cryptography website that does exactly this (click on the button CALCULATE PROBABLE KEY-LENGTHS
), and I got something really different with the same text where L is the key length, in conclusion there is something wrong, but I can't seem to find it :
L=7 IC ≈ 0.07832 ± 0.006 --> This is the key
L=14 IC ≈ 0.07986 ± 0.013
L=21 IC ≈ 0.07504 ± 0.015
L=5 IC ≈ 0.04717 ± 0.028
L=10 IC ≈ 0.04667 ± 0.028
L=15 IC ≈ 0.04646 ± 0.029
L=19 IC ≈ 0.04616 ± 0.029
L=20 IC ≈ 0.04562 ± 0.029
L=22 IC ≈ 0.04523 ± 0.03
L=11 IC ≈ 0.04516 ± 0.03
L=4 IC ≈ 0.04515 ± 0.03
L=3 IC ≈ 0.04494 ± 0.03
L=1 IC ≈ 0.04487 ± 0.03
L=17 IC ≈ 0.04462 ± 0.03
L=2 IC ≈ 0.04453 ± 0.03
L=9 IC ≈ 0.04373 ± 0.031
L=23 IC ≈ 0.04352 ± 0.031
L=13 IC ≈ 0.04336 ± 0.032
L=8 IC ≈ 0.04319 ± 0.032
L=6 IC ≈ 0.04278 ± 0.032
L=26 IC ≈ 0.0426 ± 0.032
L=12 IC ≈ 0.04199 ± 0.033
L=16 IC ≈ 0.04128 ± 0.034
L=25 IC ≈ 0.04131 ± 0.034
L=18 IC ≈ 0.04006 ± 0.035
L=24 IC ≈ 0.03681 ± 0.038
In key length guessing, you don't compute pieces using pieces[i:i+x]
, this will not provide with the proper cosets you will need to perform the IC evaluation.
Rather you might want to define a cut in columns function:
def cut_in_columns(text, k):
"""
Cut text in k columns
"""
columns = [[] for _ in range(k)]
for i, l in enumerate(text):
columns[i % k].append(l)
return columns
which will provide you the right pieces
.
For the reason why this is true, I may redirect you to this post: https://pages.mtu.edu/~shene/NSF-4/Tutorial/VIG/Vig-IOC-Len.html on key length guessing.
Replacing it in your code yields the desired result.
The fundamental property lies in the definition of these ciphers methods.
BTW, you should test the code you post, you have an undefined variable (french moyenne
mixed with mean
) and early return in your maximum computations which are not even maximum computations… You should ensure that your minimal working example is indeed working.