Search code examples
bioinformaticsfastq

How can I create a FASTQ sequence file?


I have a genomic database, which contains a simple character sequence (like >chr1 AGTGTCA.....). Now, I want to convert it to the standard FASTQ format like this:

@HWUSI-EAS594-R:1:3:1453:1350#0/1 
CCCAGTTCCGACGATCGATTTGCACGTCAGAATCGCTACGGACCTCCATCAGGGTTTCCCCTGACTTCGTCCTGACCAGG
+   
ea^cdfdffgggggggggggeggggdggdffgdbdgddgggg`g^dfbfgdggcfbgfffcb]gffbfcfcefbbBBBB

As I have no clear idea about this type of format, I am not able to convert it. How can I convert a simple character sequence to the FASTQ format (as in the above example)?

Specifically, I am asking:

  1. Is there any existing code to do the encoding?
  2. If not, how can I encode the character sequence in FASTQ? What does this format imply and how can I create it?

Solution

  • Because you only have the sequence and not the quality (reliability) scores of the sequence derivation, I think you don't have enough information to construct a FASTQ file. (I am not a bioinformatics expert, however.) Instead, you should probably keep using the FASTA file format, which contains only the sequence information.