I am using biopython to do something similar to this, Sort rps-blast results by position of the hit but want to join or concatenate local hits to have contiguous stretch of queries and subject hits.
My code:
for record in records:
for alignment in record.alignments:
hits = sorted((hsp.query_start, hsp.query_end, hsp.sbjct_start, hsp.sbjct_end, alignment.title, hsp.query, hsp.sbjct)\
for hsp in alignment.hsps)
for q_start, q_end, sb_start, sb_end, title, query, sbjct in hits:
print title
print 'The query starts from position: ' + str(q_start)
print 'The query ends at position: ' + str(q_end)
print 'The hit starts at position: ' + str(sb_start)
print 'The hit ends at position: ' + str(sb_end)
print 'The query is: ' + query
print 'The hit is: ' + sbjct
This would give sorted results as so:
Species_1
The query starts from position: 1
The query ends at position: 184
The hit starts at position: 1
The hit ends at position: 552
The query is: #######query_seq
The hit is: ######### hit_seq
Species_1
The query starts from position: 390
The query ends at position: 510
The hit starts at position: 549
The hit ends at position: 911
The query is: #######query_seq
The hit is: ######### hit_seq
Species_1
The query starts from position: 492
The query ends at position: 787
The hit starts at position: 889
The hit ends at position: 1776
The query is: #######query_seq
The hit is: ######### hit_seq
This is all fine but I want go the next logical step and that is to concatenate all the three sub_queries and sub-hits shown here (the number of hits do vary) to get complete query and subject sequences. What could be the way forward?
Okk, so I am giving you a sample solution. Hope, it will help!
You can create an empty variable outside of the loop and concatenate the query string to that variable. Here is an edit on your given code:
expected_query_seq = ""
for record in records:
for alignment in record.alignments:
hits = sorted((hsp.query_start, hsp.query_end, hsp.sbjct_start, hsp.sbjct_end, alignment.title, hsp.query, hsp.sbjct)\
for hsp in alignment.hsps)
for q_start, q_end, sb_start, sb_end, title, query, sbjct in hits:
print title
print 'The query starts from position: ' + str(q_start)
print 'The query ends at position: ' + str(q_end)
print 'The hit starts at position: ' + str(sb_start)
print 'The hit ends at position: ' + str(sb_end)
print 'The query is: ' + query
print 'The hit is: ' + sbjct
expected_query_seq += str(query[q_start:q_end])
print expected_query_seq