Search code examples
pythonbioinformaticsvcf-variant-call-format

Create a new dictionary with each iteration of loop


I am trying to extract positions and SNPs from a VCF file. I have written the following so far. But how can I change the name of the dictionary so that I end up with one dictionary for each input file?

i.e.: python vcf_compare.py file1.vcf file2.vcf file3.vcf

import sys

import vcf

for variants in sys.argv[1:]:
    file1 = {} 
    vcf_reader = vcf.Reader(open(variants))
    for record in vcf_reader:
        pos = record.POS
        alt = record.ALT
        ref= record.REF
        snps[pos]=ref,alt

so for argv[1] a dictionary called file1 is created. How can I make the dictionary change name to e.g. file two for the second iteration of the loop?


Solution

  • Short answer: you can't. This is an incredibly frustrating fact to many early programmers. The fix: another dictionary! outside of your variants for loop, create another dictionary and use the filename as a key. Example (you can't just copy paste this, because I don't know how to use the vcf library):

    import sys
    
    import vcf
    
    all_files = {}
    for variants in sys.argv[1:]:
        #didn't see file1 used, and didn't see snps created
        #so figured file1 was snps...
        snps = {} 
        vcf_reader = vcf.Reader(open(variants))
        for record in vcf_reader:
            pos = record.POS
            alt = record.ALT
            ref= record.REF
            snps[pos]=ref,alt
        all_files[variants] = snps
    

    I'm assuming here that variants is a filename in the form of a string. If not, replace the variants in all_files[variants] with the string you want to use as its key.