Search code examples
pythondictionary-comprehension

Dictionary comprehension with a nested list


I have a string of characters and a list of characters. I wish to create a dictionary in which the keys are the characters as and the values are the list, only without the key character.

A string of characters:

sequence = 'ATGCG'

The list:

bases = ['C', 'T', 'A', 'G']

The resulting dictionary would be:

{'A': ['C', 'T', 'G'],
 'T': ['C', 'A', 'G'],
 'G': ['C', 'T', 'A'],
 'C': ['T', 'A', 'G'],
 'G': ['C', 'T', 'A'],
}

I tried using the following code but got a list of 4 items:

variations = {current_base: [base for base in bases if current_base != base]
              for current_base in sequence}

I'd love to get ideas regarding what I'm doing wrong. Thanks.


Solution

  • What you want to do is impossible, a dictionary cannot have duplicated keys.

    {'A': ['C', 'T', 'G'],
     'T': ['C', 'A', 'G'],
     'G': ['C', 'T', 'A'],
     'C': ['T', 'A', 'G'],
     'G': ['C', 'T', 'A'], ## this is impossible
    }
    

    You can use a list of tuples instead. I am taking the opportunity to show you a more efficient method using python sets:

    sequence = 'ATGCG'
    bases = set(list('ACGT'))
    [(b,list(bases.difference(b))) for b in sequence]
    

    NB. actually, it is even more efficient to pre-compute the diffs as you have a potentially very long DNA sequence, but only 4 bases:

    sequence = 'ATGCG'
    bases = set(list('ACGT'))
    diffs = {b: list(bases.difference(b)) for b in bases}
    [(b,diffs[b]) for b in sequence]
    

    output:

    [('A', ['T', 'C', 'G']),
     ('T', ['A', 'C', 'G']),
     ('G', ['T', 'A', 'C']),
     ('C', ['T', 'A', 'G']),
     ('G', ['T', 'A', 'C'])]
    
    alternative output using the position as key:
    {i: list(bases.difference(b)) for i,b in enumerate(sequence)}
    

    output:

    {0: ['T', 'C', 'G'],
     1: ['A', 'C', 'G'],
     2: ['T', 'A', 'C'],
     3: ['T', 'A', 'G'],
     4: ['T', 'A', 'C']}