I have created a dictionary of contigs and their lengths in file1. I also have file2 which is blast output in tabular format, which contains contig alignments (but not all of them) and some additional information like where match starts and finishes etc. In order to calculate query and subject coverage, I need to associate those lengths from file1, to length in file2. How to do that? Thanks
Assuming file1 is:
contig1 134
contig2 354
contig3 345
Your script would look like
import re
contigDict={}
with open('file1') as c1:
text=c1.readlines()
for line in text:
key,value = line.split()
contigDict[key]=value
with open('file2') as c2:
scrambled_text=c2.read()
contigs = re.findall(r'contig\d+',scrambled_text)
output = {}
for contig in contigs:
output[contig]=contigDict[contig]
with open('file3',w) as w:
for key in output.keys():
w.write(key+'\t'+output[key]+'\n')