Search code examples
bioinformaticsbiopythonfastabioconductorbioperl

multiFASTA file processing


I was curious to know if there is any bioinformatics tool out there able to process a multiFASTA file giving me infos like number of sequences, length, nucleotide/aminoacid content, etc. and maybe automatically draw descriptive plots. Also an R BIoconductor solution or a BioPerl module would do, but I didn't manage to find anything.

Can you help me? Thanks a lot :-)


Solution

  • Some of the emboss tools are a collection of small tools that can help you out.

    To count number of fasta entries, I use: grep -c '^>' mySequences.fasta.

    To make sure none of the entries are duplicate, I check that I get the same number when doing this: grep '^>' mySequences.fasta | sort | uniq | wc -l