I have a string of characters and a list of characters. I wish to create a dictionary in which the keys are the characters as and the values are the list, only without the key character.
A string of characters:
sequence = 'ATGCG'
The list:
bases = ['C', 'T', 'A', 'G']
The resulting dictionary would be:
{'A': ['C', 'T', 'G'],
'T': ['C', 'A', 'G'],
'G': ['C', 'T', 'A'],
'C': ['T', 'A', 'G'],
'G': ['C', 'T', 'A'],
}
I tried using the following code but got a list of 4 items:
variations = {current_base: [base for base in bases if current_base != base]
for current_base in sequence}
I'd love to get ideas regarding what I'm doing wrong. Thanks.
What you want to do is impossible, a dictionary cannot have duplicated keys.
{'A': ['C', 'T', 'G'],
'T': ['C', 'A', 'G'],
'G': ['C', 'T', 'A'],
'C': ['T', 'A', 'G'],
'G': ['C', 'T', 'A'], ## this is impossible
}
You can use a list of tuples instead. I am taking the opportunity to show you a more efficient method using python sets:
sequence = 'ATGCG'
bases = set(list('ACGT'))
[(b,list(bases.difference(b))) for b in sequence]
NB. actually, it is even more efficient to pre-compute the diffs as you have a potentially very long DNA sequence, but only 4 bases:
sequence = 'ATGCG'
bases = set(list('ACGT'))
diffs = {b: list(bases.difference(b)) for b in bases}
[(b,diffs[b]) for b in sequence]
output:
[('A', ['T', 'C', 'G']),
('T', ['A', 'C', 'G']),
('G', ['T', 'A', 'C']),
('C', ['T', 'A', 'G']),
('G', ['T', 'A', 'C'])]
{i: list(bases.difference(b)) for i,b in enumerate(sequence)}
output:
{0: ['T', 'C', 'G'],
1: ['A', 'C', 'G'],
2: ['T', 'A', 'C'],
3: ['T', 'A', 'G'],
4: ['T', 'A', 'C']}