I have Illumina paired-end reads for E. coli that I used to create a draft assembly (SPAdes). I now have the task of creating the input BAM files that I will use with Pilon -- which is used to improve a draft assembly.
I decided to employ BWA using the documentation here: http://bio-bwa.sourceforge.net/bwa.shtml#3
The plan in to create an index of the reference genome, create the alignments, and then convert to BAM files.
Here is the command I used to index the reference:
bwa index -p bwa_indices/B055 temp/spades/scaffolds.fasta
This command output the following files: B055.amb B055.ann B055.bwt B055.pac B055.sa
The next step is to generate the alignment files -- for which I used the following command:
bwa aln -t 20 temp/spades/scaffolds.fasta temp/spades/corrected/B055_S5_R1_filtered_1P.fastq.00.0_0.cor.fastq.gz > bwa_indices/B055_R1.sai
#bwa aln -t 20 temp/spades/scaffolds.fasta temp/spades/corrected/B055_S5_R1_filtered_2P.fastq.00.0_0.cor.fastq.gz > bwa_indices/B055_R2.sai
After running the first command, I received the following output:
[bwa_aln] 17bp reads: max_diff = 2
[bwa_aln] 38bp reads: max_diff = 3
[bwa_aln] 64bp reads: max_diff = 4
[bwa_aln] 93bp reads: max_diff = 5
[bwa_aln] 124bp reads: max_diff = 6
[bwa_aln] 157bp reads: max_diff = 7
[bwa_aln] 190bp reads: max_diff = 8
[bwa_aln] 225bp reads: max_diff = 9
[bwa_aln] fail to locate the index
The last line has vexed me a bit. There is an output file (B055_R1.sai), but it is empty.
I can clearly see that within my alignment command, there is no reference to any of the index files that were previously created, but when I look at the documentation (http://bio-bwa.sourceforge.net/bwa.shtm), I see no option for referencing any index files. Googling a bit led me to a site that said I needed to have my reference fasta file in the same directory as the index files, and I even changed the name of my draft assembly fasta file from scaffolds.fasta to B055.fasta -- but to no avail. I also unzipped the fastq.gz file and changed the extension from fastq to fq -- all were met with unsuccessful results. Those may still be issues, but it seems to me that referencing the index file(s) in the last bwa aln call is the most pressing issue.
Can anyone kindly point me in the proper direction? I am using BWA Version: 0.7.5a-r405 (I also installed the latest version (Version: 0.7.12-r1039) with no improvement), CentOS 6.7, with 34 cores and plenty of memory.
Thank you in advance.
Based on some suggestions by someone in a different forum, I changed the names of my files such that they were consistent across the board.
mkdir -p bwa_indices
bwa index -p B055 -a is B055.fa
bwa aln -t 20 B055.fa ../temp/spades/corrected/B055_S5_R1_filtered_1P.fq > B055_R1.sai
However, I was still receiving the error. I believe that this was an issue of outdated/incorrect documentation.
The documentation (http://bio-bwa.sourceforge.net/bwa.shtml#3) has the following for aligning (note the in.db.fasta):
aln bwa aln [-n maxDiff] [-o maxGapO] [-e maxGapE] [-d nDelTail] [-i nIndelEnd] [-k maxSeedDiff] [-l seedLen] [-t nThrds] [-cRN] [-M misMsc] [-O gapOsc] [-E gapEsc] [-q trimQual] <in.db.fasta> <in.query.fq> > <out.sai>
I had been using the following (I tried both .fa and .fasta extensions):
bwa aln -t 20 B055.fa B055_R1_1P.fq > B055_R1.sai
I removed the .fa extension and it ran.
bwa aln -t 20 B055 B055_R1_1P.fq > B055_R1.sai