I was curious to know if there is any bioinformatics tool out there able to process a multiFASTA file giving me infos like number of sequences, length, nucleotide/aminoacid content, etc. and maybe automatically draw descriptive plots. Also an R BIoconductor solution or a BioPerl module would do, but I didn't manage to find anything.
Can you help me? Thanks a lot :-)
Some of the emboss tools are a collection of small tools that can help you out.
seqstats
returns sequence lengthpepstats
should give you aminoacid content etc.
Some of the tools also offer plotting functions. Very handy.
http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/groups.htmlTo count number of fasta entries, I use:
grep -c '^>' mySequences.fasta
.
To make sure none of the entries are duplicate, I check that I get the same number when doing this: grep '^>' mySequences.fasta | sort | uniq | wc -l