I need to calculate the number and percentages of polar/non-polar, aliphatic/aromatic/heterocyclic amino acids in this protein sequence that I got from UNIPROT, using BioJava.
I have found in the BioJava tutorial how to read the Fasta files and implemented this code. But I have no ideas how to solve this problem.
If you have some ideas please help me.
Maybe there are some sources where I can check it.
This is the code.
package biojava.biojava_project;
import java.net.URL;
import org.biojava.nbio.core.sequence.ProteinSequence;
import org.biojava.nbio.core.sequence.io.FastaReaderHelper;
public class BioSeq {
// Inserting the sequence from UNIPROT
public static ProteinSequence getSequenceForId(String uniProtId) throws Exception {
URL uniprotFasta = new URL(String.format("https://rest.uniprot.org/uniprotkb/P31574.fasta", uniProtId));
ProteinSequence seq = FastaReaderHelper.readFastaProteinSequence(uniprotFasta.openStream()).get(uniProtId);
System.out.printf("id : P31574", uniProtId, seq, System.getProperty("line.separator"), seq.getOriginalHeader());
System.out.println();
return seq;
}
public static void main(String[] args) {
try {
System.out.println(getSequenceForId("P31574"));
} catch (Exception e) {
e.printStackTrace();
}
}
}
I don't know if BioJava stores these properties anywhere. But it's easy to just list all the amino acids with their properties manually. Then iterate over the sequence and count those that satisfy the property. Here's an example for the polarity:
import java.io.InputStream;
import java.net.URL;
import java.util.Set;
import org.biojava.nbio.core.sequence.ProteinSequence;
import org.biojava.nbio.core.sequence.compound.AminoAcidCompound;
import org.biojava.nbio.core.sequence.io.FastaReaderHelper;
public class BioSeq {
public static void main(String[] args) throws Exception {
ProteinSequence seq = loadFromUniprot("P31574");
int polarCount = numberOfOccurrences(seq, /*Polar AAs:*/ Set.of("Y", "S", "T", "N", "Q", "C"));
System.out.println("% of polar AAs: " + ((double)polarCount)/seq.getLength());
}
public static ProteinSequence loadFromUniprot(String uniProtId) throws Exception {
URL uniprotFasta = new URL(String.format("https://rest.uniprot.org/uniprotkb/%s.fasta", uniProtId));
try (InputStream is = uniprotFasta.openStream()) {
return FastaReaderHelper.readFastaProteinSequence(is).get(uniProtId);
}
}
private static int numberOfOccurrences(ProteinSequence seq, Set<String> bases) {
int count = 0;
for (AminoAcidCompound aminoAcid : seq)
if(bases.contains(aminoAcid.getBase()))
count++;
return count;
}
}
PS: don't forget to close IO streams after you used them. In the example above I used try-with-resources syntax which automatically closes the InputStream.