Search code examples
r

BLOSUM62 not found in R


I'm struggling to load the BLOSUM62 matrix in R, that I know it should be inside the package Biostrings. I'm new in R, I'm following an exercise that uses this matrix without problems, while when I try to compile I get the error

dataset not found

I tried to reinstall Biostring but I cannot fix it.

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}
BiocManager::install("Biostrings")

library(Biostrings)
data("BLOSUM62")

In data("BLOSUM62") : dataset ‘BLOSUM62’ not found

I also tried to download the matrix from NCBI I load it correctly from a .txt and convert it into a 25x25 matrix

BLOSUM62=read.table("BLOSUM62.txt", header = TRUE)
BLOSUM62 <- data.matrix(BLOSUM62)

and it works, but then I should compare it with another dataset (from the exercise did it in class) I get the error

Gene <- msa(Geneseq, substitutionMatrix = "blosum", method = "ClustalW")
cs <- msaConservationScore(Gene, BLOSUM62)

Error in msaConservationScore.matrix(mat, ...) : substitution matrix is not in proper format

How can I solve it?


Solution

  • The matrix BLOSUM62 can be found in the pwalign package.

    library(pwalign)
    data("BLOSUM62")
    
    print(BLOSUM62)
    
       A  R  N  D  C  Q  E  G  H  I ...
    A  4 -1 -2 -2  0 -1 -1  0 -2 -1 ... 
    R -1  5  0 -2 -3  1  0 -2  0 -3 ...
    N -2  0  6  1 -3  0  0  0  1 -3 ...
    D -2 -2  1  6 -3  0  2 -1 -1 -3 ...
    C  0 -3 -3 -3  9 -3 -4 -3 -3 -1 ...
    Q -1  1  0  0 -3  5  2 -2  0 -3 ...
    E -1  0  0  2 -4  2  5 -2  0 -3 ...
    G  0 -2  0 -1 -3 -2 -2  6 -2 -4 ...
    H -2  0  1 -1 -3  0  0 -2  8 -3 ...
    I -1 -3 -3 -3 -1 -3 -3 -4 -3  4 ...
     ...