I have a fasta file with few sequences and I would like to perform sliding windows of window size 5 and extract the sequences whenever it sweeps through the sequence.
For example ( test1.fasta ):
>human1
ATCGCGTC
>human2
ATTTTCGCGA
Expected output ( test1_out.txt ):
>human1
ATCGC
>human1
TCGCG
>human1
CGCGT
>human1
GCGTC
>human2
ATTTT
>human2
TTTTC
>human2
TTTCG
>human2
TTCGC
>human2
TCGCG
>human2
CGCGA
My following code only able to extract the first five base pairs. How can I shift the window to extract 5 bp for every step size of 1 with window size 5?
from Bio import SeqIO
with open("test1_out.txt","w") as f:
for seq_record in SeqIO.parse("test1.fasta", "fasta"):
f.write(str(seq_record.id) + "\n")
f.write(str(seq_record.seq[:5]) + "\n") #first 5 base positions
Above code I got it from other example in stackoverflow*
So I guess "seq_record.seq" is the whole DNA sequece like in human1 "ATCGCGTC". You can write like this:
from Bio import SeqIO
with open("test1_out.txt","w") as f:
for seq_record in SeqIO.parse("test1.fasta", "fasta"):
for i in range(len(seq_record.seq) - 4) :
f.write(str(seq_record.id) + "\n")
f.write(str(seq_record.seq[i:i+5]) + "\n") #first 5 base positions