Search code examples
rbioinformaticsdna-sequencephylogeny

Identifying mutations between haplotypes using haploNet (pegas) in R


I think this might be an easy question, but I could not solve it after reading the pegas documentation. I want to plot an haplotype network using a FASTA file and identify which mutations are separatting the distinct haplotypes.

Example:

fa <- read.FASTA("example.fa")
haps <- haplotype(fa)
haps50 <- subset(haps, minfreq = 50)
(network <- haploNet(haps50))

plot(network, size = attr(network, "freq"),show.mutation=1,labels=T)

enter image description here

How can I identify the position of the mutation in my FASTA file that is separating for example haplotype XX from V?

Extra question: Would it be also possible to know for example, what is the haplotype sequence of one of the haplotypes? For example haplotype V, which is quite frequent?


Solution

  • The pegas package contains a function diffHaplo, which specifies differences between haplotypes.

    diffHaplo(haps50, a = "XX", b = "V")
    

    To extract the DNA sequence of the frequent haplotype V, the indexes in the object of class haplotype will identify, which DNA sequence contains the haplotype.

    # haplotype index from its name
    i <- which(attr(haps50, "dimnames")[[1]] == "V")
    # index of the first sequence with the haplotype
    s <- attr(haps50, "index")[[i]][1]
    

    The respective sequence can then be identified in the alignment fa to save or print on screen.

    write.dna(fa[s], file = "hapV.fas", format = "fasta", nbcol = -1, colsep = "")
    paste(unlist(as.character(fa[s])), collapse = "")