Search code examples
ralignmentbioinformaticsfasta

How to output results of 'msa' package in R to fasta


I am using the R package msa, a core Bioconductor package, for multiple sequence alignment. Within msa, I am using the MUSCLE alignment algorithm to align protein sequences.

library(msa)
myalign <- msa("test.fa", method=c("Muscle"), type="protein",verbose=FALSE)

The test.fa file is a standard fasta as follows (truncated, for brevity):

>sp|P31749|AKT1_HUMAN_RAC
MSDVAIVKEGWLHKRGEYIKTWRPRYFLL
>sp|P31799|AKT1_HUMAN_RAC
MSVVAIVKEGWLHKRGEYIKTWRFLL

When I run the code on the file, I get:

 MUSCLE 3.8.31   

Call:
   msa("test.fa", method = c("Muscle"), type = "protein", verbose = FALSE)

MsaAAMultipleAlignment with 2 rows and 480 columns
    aln 
[1] MSDVAIVKEGWLHKRGEYIKTWRPRYFLL
[2] MSVVAIVKEGWLHKRGEYIKTWR---FLL
Con MS?VAIVKEGWLHKRGEYIKTWR???FLL

As you can see, a very reasonable alignment.

I want to write the gapped alignment, preferably without the consensus sequence (e.g., Con row), to a fasta file. So, I want:

>sp|P31749|AKT1_HUMAN_RAC
MSDVAIVKEGWLHKRGEYIKTWRPRYFLL
>sp|P31799|AKT1_HUMAN_RAC
MSVVAIVKEGWLHKRGEYIKTWR---FLL

I checked the msa help, and the package does not seem to have a built in method for writing out to any file type, fasta or otherwise.

The seqinr package looks somewhat promising, because maybe it could read this output as an msf format, albeit a weird one. However, seqinr seems to need a file read in as a starting point. I can't even save this using write(myalign, ...).


Solution

  • I think you ought to follow the examples in the help pages that show input with a specific read function first, then work with the alignment:

    mySeqs <- readAAStringSet("test.fa")
    myAlignment <- msa(mySeqs)
    

    Then the rownames function will deliver the sequence names:

     rownames(myAlignment)
    [1] "sp|P31749|AKT1_HUMAN_RAC" "sp|P31799|AKT1_HUMAN_RAC"
    

    (Not what you asked for but possibly useful in the future.) Then if you execute:

    detail(myAlignment)   #function actually in Biostrings
    

    .... you get a text file in interactive mode that you can save

    2 29
    sp|P31749|AKT1_HUMAN_RAC   MSDVAIVKEG WLHKRGEYIK TWRPRYFLL
    sp|P31799|AKT1_HUMAN_RAC   MSVVAIVKEG WLHKRGEYIK TWR---FLL
    

    If you wnat to try hacking a function for which you can get a file written in code, then look at the Biostrings detail function code that is being used

    > showMethods( f= 'detail')
    Function: detail (package Biostrings)
    x="ANY"
    x="MsaAAMultipleAlignment"
        (inherited from: x="MultipleAlignment")
    x="MultipleAlignment"
    
    showMethods( f= 'detail', classes='MultipleAlignment', includeDefs=TRUE)
    Function: detail (package Biostrings)
    x="MultipleAlignment"
    function (x, ...) 
    {
        .local <- function (x, invertColMask = FALSE, hideMaskedCols = TRUE) 
        {
            FH <- tempfile(pattern = "tmpFile", tmpdir = tempdir())
            .write.MultAlign(x, FH, invertColMask = invertColMask, 
                showRowNames = TRUE, hideMaskedCols = hideMaskedCols)
            file.show(FH)
        }
        .local(x, ...)
    }