I am trying to generate a random barcode_list with 6 UNIQUE barcodes that have a hamming distance of 3. The issue is that the program is generating a barcode list with duplicates and not the correct hamming distance. Below is the code.
import random
nucl_list = ['A', 'C', 'G', 'T']
length = 6
number = 6
attempts = 1000
barcode_list = []
tested = []
def make_barcode():
"""Generates a random barcode from nucl_list"""
barcode = ''
for i in range(length):
barcode += random.choice(nucl_list)
return barcode
def distance(s1, s2):
"""Calculates the hamming distance between s1 and s2"""
length1 = len(s1)
length2 = len(s2)
# Initiate 2-D array
distances = [[0 for i in range(length2 + 1)] for j in range(length1 + 1)]
# Add in null values for the x rows and y columns
for i in range(0, length1 + 1):
distances[i][0] = i
for j in range(0, length2 + 1):
distances[0][j] = j
for i in range(1, length1 + 1):
for j in range(1,length2 + 1):
cost = 0
if s1[i - 1] != s2[j - 1]:
cost = 1
distances[i][j] = min(distances[i - 1][j - 1] + cost, distances[i][j - 1] + 1, distances[i - 1][j] + 1)
min_distance = distances[length1][length2]
for i in range(0, length1 + 1):
min_distance = min(min_distance, distances[i][length2])
for j in range(0, length2 + 1):
min_distance = min(min_distance, distances[length1][j])
return min_distance
def compare_barcodes():
"""Generates a new barcode and compares with barcodes in barcode_list"""
new_barcode = make_barcode()
# keep track of # of barcodes tested
tested.append(new_barcode)
if new_barcode not in barcode_list:
for barcode in barcode_list:
dist = distance(barcode, new_barcode)
if dist >= 3:
barcode_list.append(new_barcode)
else:
pass
else:
pass
# make first barcode
first_barc = ''
for i in xrange(length):
first_barc += random.choice(nucl_list)
barcode_list.append(first_barc)
while len(tested) < attempts:
if len(barcode_list) < number:
compare_barcodes()
else:
break
barcode_list.sort()
print barcode_list
I think my issue is with the last while loop: I want compare_barcodes
to continually generate barcodes that fit the criteria (not a duplicate, and not within hamming distance of any of the barcodes already generated).
the answer of @Jkdc is correct, +1 for him. In your original code, you are almost there. Here's my suggestion, move your if new_barcode not in barcode_list:
condition inside your for loop
, make it if new_barcode not in barcode_list and distance(barcode, new_barcode)
, then you will not add any duplicates in your list, and then calculate the distance only if the new_barcode
not in your barcode_list
:
def compare_barcodes():
"""Generates a new barcode and compares with barcodes in barcode_list"""
new_barcode = make_barcode()
# keep track of # of barcodes tested
tested.append(new_barcode)
for barcode in barcode_list:
if new_barcode not in barcode_list and distance(barcode, new_barcode):
barcode_list.append(new_barcode)
Another suggestion is if you want to avoid duplicates, you can use set
store your barcodes, set
manipulates unsorted unique elements.