I wrote the code below to generate a list containing 25 lists, where each of them has 40 elements. However, the main issue is to have a low level of similarity between the sequenced elements of the all the lists (I tried to apply SequenceMatcher from difflib). Although the condition is to stop the loop when the number of inner lists = 25, I get 32 inner lists.
Here is my code:
import random
from difflib import SequenceMatcher
def string_converter(input_list):
string = ""
for m in input_list:
string += str(m)
return string
lists = []
strings = []
e = 0
while e <= 25:
list_one = []
n = 0
for i in range(40):
if 7 < n < 33:
i = random.randint(0, 3)
list_one.append(i)
n += 1
else:
i = random.randint(0, 2)
list_one.append(i)
n += 1
list_string = string_converter(list_one)
if e == 0:
strings.append(list_string)
lists.append(list_one)
e = 1
else:
for s in strings:
if SequenceMatcher(None, list_string, s).ratio() < 0.7:
strings.append(list_string)
lists.append(list_one)
e += 1
print(e)
print(lists)
print(len(lists))
print(strings)
Your problem is this loop, which can append multiple copies of list_one
to lists
as you iterate over strings
:
for s in strings:
if SequenceMatcher(None, list_string, s).ratio() < 0.7:
strings.append(list_string)
lists.append(list_one)
e += 1
What you need to do is check if all SequenceMatcher
values are <0.7
and only append if they are. Something like this:
if all(SequenceMatcher(None, list_string, s).ratio() < 0.7 for s in strings):
strings.append(list_string)
lists.append(list_one)
e += 1