Search code examples
k-meanshierarchical-clusteringdna-sequence

How to do decimal encoding of DNA sequences (dataset)?


I need to perform K-means clustering and Hierarchical clustering of DNA sequences(nucleotide) sequences which i have downloaded in FASTA format. So before performing clustering I need to do DECIMAL ENCODING OF bases(a,t,c,g).. so how to do that.. so that i can take this input in the matrix form in MATLAB?.


Solution

  • Use the nt2int function. Documentation on it below:

    http://www.mathworks.com/help/bioinfo/ref/nt2int.html