Search code examples
pythondictionaryfasta

Is there a special value that doesn't insert a key in a dictionary


Is there a way of assigning a special key to a dictionary that actually does nothing?

I want to do something like:

mydict = {}
key, value = 'foo', 'bar'
mydict[key] = value   % now my dict has {'foo': 'bar'}

Now here I want some "special" value of key such that when I run:

mydict[key] = value

It doesn't actually do anything, so mydict is still {'foo': 'bar'} (no extra keys or values added)

I tried using:

d[None] = None   # It actually adds {None: None} to the dict
d[] = []         # Invalid syntax

Why I need this:

Well it's basically to handle an initial case.

I have a file which is actually a FASTA format:

>id_3362
TGTCAGTGTTCCCCGTGGCCCTGCGGTTGGAATTGCAGCGGGTCGCTTTAGTTCTGGCAT
ATATTTTGACGGTGCCGGCCGGCGATACTGACGTGTGAGGACTTGAATTTGTACCAGCGC
AACACTTCCAAAGCCTGGACTAGGTTGT
>id_4743
CGGGGGATCTAATGTGGCTGCCACGGGTTGAAAAATGG
>id_5443
ATATTTTGACGGTGCCGGCCGGCGATACTGACGTGTGAGGACTTGAATTTGTACCAGCGC
AACACTTCCAAAGCCTGGACTAGGTTGT

My approach is to read line by line, concatenating the lines into a sequence until the next key is found (line starting with >). Then I save the key (id) with the associated value (sequence) in a dictionary, update the key and start accumulating the next sequence.

Of course I can have a dedicated code (repeated) that handles the first case (which I think it's not a clean approach) or I can have an if inside the loop that reads each line (which will execute every time)

So the cleanest approach would be every time an id is found, save the previous id with the accumulated seq to the dictionay, but to handle the first line I need some special value for the key.

Here's my code:

def read_fasta(filename):
    mydict = {}
    id = None      # this has to be the special character I'm looking for
    seq = ''

    with open(filename) as f:            
        for line in f:
            if line[0] == '>':
                mydict[id] = seq             # save current id and seq
                id = line[1:].rstrip('\n')   # update id
                seq = ''                     # clean seq
            else:
                seq += line.rstrip('\n')     # accumulate seq

As you can see, in this code the first line will insert the value {None:''} to the dictionary.

I could of course delete this key at the very end, but I'm wondering if I can have an initial value that doesn't insert anything when executed.

Any suggestions?


Solution

  • You could of course do:

    id = None
    

    then:

    if id is not None: mydict[id] = seq
    

    If you want to avoid insertion without if testing, you could also use a non-hashable value at start.

    id = []
    

    then catch the "unhashable exception". That would work, although ugly, but no extra overhead because the exception is triggered only once.

       try:
          mydict[id] = seq
       except TypeError:
          pass
    

    Aside: if speed is your concern then don't use string concatenation

    seq += line.rstrip('\n')
    

    is just horribly underperformant. Instead:

    • define seq as a list: seq = []
    • append lines to seq: seq.append(line.rstrip('\n'))
    • in the end create the final string: seq = "".join(seq)